picongpu
picongpu copied to clipboard
hemera v100: openPMD-api output to hdf5 - simulation hangs
Running the default Laser Wakefield example on hemera V100 GPUs using the h5 backend of openPMD-api instead of the bp backend, leads to a hanging simulation.
I could run the simulation with bp without any problems. Switching to h5 resulted in a hanging simulation right after init.
The first h5 output file was written but never closed.
I met this issue too, and the h5 output works fine only when the code was run in one gpu. The parallel output for hdf5 seems incorrct. I still can't find the solution.
Often the problem is coming from broken chunking in HDF5.
This could be a solution: https://github.com/ComputationalRadiationPhysics/picongpu/issues/4845#issuecomment-2009408453
Well, in the past I notet that it does not hang, but actually writes very very very [many more 'very'] slowly. That's why I used bp. Which was the solution for me.