Hi folks,
I am currently working on implementing Adios with sobol. As our Adios writers are sending the data independently we can do the grouping and then compute the sobol’ indices. But before jumping on the actual coding part. I have seen some strange behaviour with the OpenMPI. When sobol is set, the launcher submits the following command per group ,
mpirun <args1> <client1> : <args1> <client2> : ...
When we launch Adios writers likewise, All of the writers will start writing, and send the messages as if they all assume a successful handshake with all the readers was established.
However, on the reader side, only the client 0 (first simulation in each group) gets a successful handshake. So, for the client 0, all time steps will be delivered to the server and the reader will keep looking for the engine files of the writers that have already assumed a successful delivery and have closed their respective files and will eventually timeout.
I have recreated this bug with a simple reader/writer scenario
===EDIT===
In the shared reader/writer example, I split the MPI world communicator for each process and now it works fine.
But, in our heatc executables, the splitting occurs for each writer process anyway and the study fails regardless. This needs more investigation.
===EDIT===
But, can we ignore this way of executing the mpirun and submit client scripts separately for a group ?
~Abhishek.