openPMD-api
openPMD-api copied to clipboard
file based bp5 writer hang
Describe the bug The recent optimization breaks a MPI use case when in file based mode. A minimal code is included below. One can use 2 ranks to see effect. In short, at the second flush, rank 1 has nothing to contribute, so it didn't call BP5 while rank 0 did. In essence, BP5 write is collective. So rank 0 hangs because inactivity of rank 1. If we use variable based, it looks like a flush to ADIOS is forced (by openPMD-api? ) on all ranks and so it works.
To Reproduce
c++ example:
#include <openPMD/openPMD.hpp>
#include <mpi.h>
#include
using std::cout; using namespace openPMD;
int main(int argc, char *argv[]) { int provided; MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
int mpi_size;
int mpi_rank;
MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
auto const value = float(mpi_size*100+mpi_rank);
std::vector<float> local_data(10 * 300, value);
std::string filename = "ptl_%T.bp";
//std::string filename = "ptl.bp"; //this is variable based and it works
Series series = Series(filename, Access::CREATE, MPI_COMM_WORLD);
Datatype datatype = determineDatatype<float>();
auto myptl = series.writeIterations()[1].particles["ion"];
Extent global_ptl = {10ul * mpi_size * 300};
Dataset dataset_ptl = Dataset(datatype, global_ptl, "{}");
myptl["charge"].resetDataset(dataset_ptl);
series.flush();
if (mpi_rank == 0) // only rank 0 adds data
myptl["charge"].storeChunk(local_data, {0}, {3000});
series.flush(); // hangs here
MPI_Finalize();
return 0;
}
Software Environment
- version of openPMD-api: latest
- machine: Mac
- version of ADIOS2: latest
Additional context
- I used OPENPMD_ADIOS2_BP5_TypeAgg=EveryoneWritesSerial, but any choice of aggregation will fail.
- run with 2 cores is enough to see file based has issue.
- As far as I was aware, this use case worked fine not long ago.
- It does not affect HDF5.
I think I came across sth similar last week (but I actually got an error instead of a hang).
The issue was that I was also calling storeChunk with data vector whose .data() was a nullptr (but I also passed an extent of 0 to storeChunk).
Hello @guj and @pgrete this behavior is known and can only be fully fixed once we transition to flushing those Iterations that are open rather than those that are modified. It was not the recent optimization that broke this, rather BP5 is just much stricter with collective operations so this behavior is more likely to occur now. Until this is fully solved, please use the workaround implemented in https://github.com/openPMD/openPMD-api/pull/1619:
series.writeIterations()[1].seriesFlush();
This is guaranteed to flush Iteration 1 on all ranks regardless if it is modified or not.
Also, your example is missing a call to series.close() before MPI_Finalize().
Thanks Franz., the work around works.
@franzpoeschel I am wondering if we can in
series.flush(); // hangs here
we can call something like
series.writeIterations()[i].seriesFlush();
for the iterations that are marked open, to make sure the initial code works and we can simplify the API contract again so that series.flush() is collective and storeChunk is independent?
@franzpoeschel I am wondering if we can in
series.flush(); // hangs here
we can call something like
series.writeIterations()[i].seriesFlush();
for the iterations that are marked open, to make sure the initial code works and we can simplify the API contract again so that
series.flush()is collective andstoreChunkis independent?
Flushing the open Iterations rather than the modified ones is the far goal, yes, but we cannot do it yet. This is why this requires workarounds at the moment.
Consider the openPMD-viewer: The openPMD-api release 0.16 cannot yet reopen closed Iterations, so it needs to keep all Iterations open throughout being used in order to allow random-access of data. We can only switch to flushing open Iterations rather than dirty Iterations once we can reasonably ask users to actually close Iterations not in use. At the moment, implementing this would mean that such applications would flush all Iterations every time.
#1728 adds a further workaround that should fix workflows in which writeIterations() or readIterations() are being used.