picongpu
picongpu copied to clipboard
Problem trying to run standard examples, hipErrorOutOfMemory
I try to run the standard example from the picongpu code distribution but i have some difficulties to run some of them on multiple node configuration.
For example, i tried to run the example WarmCopper.
I can run it on one of our cluster machine using up to 4 GPUs, in order to use the full GPUs i have to lower the number of core per MPI rank to half ( 6 instead of maximum of 12 ) i order to avoid that the program get stuck after initialisation.
But now if i want to run the same program on 2 machines, and i got the following erros:
Unhandled exception of type 'St13runtime_error' with message '/lustre/rz/dbertini/gpu/picongpu/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(236) 'hipMalloc(&memPtr, static_cast<std::size_t>(widthBytes))' returned error : 'hipErrorOutOfMemory': 'hipErrorOutOfMemory'!', terminating
Unhandled exception of type 'St13runtime_error' with message '/lustre/rz/dbertini/gpu/picongpu/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(236) 'hipMalloc(&memPtr, static_cast<std::size_t>(widthBytes))' returned error : 'hipErrorOutOfMemory': 'hipErrorOutOfMemory'!', terminating
Unhandled exception of type 'St13runtime_error' with message '/lustre/rz/dbertini/gpu/picongpu/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(236) 'hipMalloc(&memPtr, static_cast<std::size_t>(widthBytes))' returned error : 'hipErrorOutOfMemory': 'hipErrorOutOfMemory'!', terminating
Unhandled exception of type 'St13runtime_error' with message '/lustre/rz/dbertini/gpu/picongpu/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(236) 'hipMalloc(&memPtr, static_cast<std::size_t>(widthBytes))' returned error : 'hipErrorOutOfMemory': 'hipErrorOutOfMemory'!', terminating
Unhandled exception of type 'St13runtime_error' with message '/lustre/rz/dbertini/gpu/picongpu/thirdParty/cupla/alpaka/include/alpaka/mem/buf/BufUniformCudaHipRt.hpp(236) 'hipMalloc(&memPtr, static_cast<std::size_t>(widthBytes))' returned error : 'hipErrorOutOfMemory': 'hipErrorOutOfMemory'!', terminating
[cupla] Error: </lustre/rz/dbertini/gpu/picongpu/include/pmacc/../pmacc/memory/buffers/Buffer.hpp>:69
The program run well on one node ( 8.cfg) using 8 GPUs. Running on 2 nodes ( 16.cfg ) i immediately got
a hipErrorOutOfMemory.
What could be the reason for that?
In attachment are my config files.
cfg.tar.gz
Could you please compile PIConGPU again with pic-build -c "-DPIC_VERBOSE=127 -DPMACC_VERBOSE=127" and rerun it and send the full stdoutandstderr` file. This enable verbose output and would say whats going on in PIConGPU.
OK will do !
For now our cluster is really busy ... i am afraid we must wait for output
In attachment the debug output corresponding to successful run using only pog_50228986.out.gz 1 GPU
In attachment the debug output corresponding to a failed WarmCopper run triggering out-of-memory error. pog_50346420.out.gz
I managed to solve the issue related to hipErrorOutOfMemory problem changing the number of cores/MPI and the maximum memory to device i.e
# host memory per device
.TBG_memPerDevice=64000000
# number of CPU cores to block per GPU
# we have 12 CPU cores per GPU (96cores/8gpus ~ 12cores)
.TBG_coresPerGPU=12
But now if i activate I/O (openPMD full dump) i got a system out-of-memory triggered:
lustre/rz/dbertini/gpu/data/copper_016/tbg/pic_sub.sh: line 40: 75854 Killed /lustre/rz/dbertini/gpu/data/copper_016/input/bin/picongpu --author "dbertini" -d 2 4 2 -g 96 96 96 -s 2000 --periodic 1 1 1 --eth_energyHistogram.period 100 --eth_energyHistogram.filter all --eth_energyHistogram.binCount 1024 --eth_energyHistogram.minEnergy 0 --eth_energyHistogram.maxEnergy 5 --ehot_energyHistogram.period 100 --ehot_energyHistogram.filter all --ehot_energyHistogram.binCount 1024 --ehot_energyHistogram.minEnergy 0 --ehot_energyHistogram.maxEnergy 250 --openPMD.period 500 --openPMD.file simData --openPMD.ext bp --checkpoint.period 500 --checkpoint.backend openPMD --versionOnce
slurmstepd: error: Detected 5 oom-kill event(s) in step 50361618.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: lxbk1080: task 0: Out Of Memory
srun: Terminating job step 50361618.0
Removing the I/O with openPMD solved this memory issue.
Any idea why openPMD output trigger an out-of-memory ?
Perhaps it's a similar issue to what you faced in #3958. Maybe the suggestion there would help now.
Generally, to be on a safe side with PIConGPU and openPMD output one should have free host memory per GPU >= 2x simulation data size on GPU. We always allocate whole GPU memory besides the reserved memory size, but here I mean how much your data actually takes on a GPU, not how much we allocated. It should actually require a bit less than 2x but depends on many factors how much exactly, so 2x should be safe.
If you are encountering unexpectedly high memory usage on host with openPMD output, it could be that its backend goes too greedy with memory allocation. The settings I linked above should be solving that. cc @franzpoeschel if you have an idea how to efficiently figure out what's going wrong.
Just to give an idea of what could be going wrong, see discussion in #4002 . However, I am not sure if it gives a good idea of how to debug it, hence pinged the expert.
Something strage is that the last verbose output is showing:
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: setting file pattern: openPMD/simData_%06T.bp
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: setting file pattern: openPMD/simData_%06T.bp
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: setting file pattern: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: setting file pattern: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: global JSON config: {"hdf5":{"dataset":{"chunks":"none"}}}
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'MallocMCBuffer' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ph' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'eth' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'ehot' (1 uses)
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'Cu' (1 uses)
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: opening Series openPMD/simData
PIConGPUVerbose INPUT_OUTPUT(32) | openPMD: open file: openPMD/simData_%06T.bp
but no one file is actually being created and the program get stuck
I am confused. Why do you have hdf5 in json, but use ADIOS output to .bp?
I am confused. Why do you have
hdf5in json, but use ADIOS output to.bp?
This is a default that PIConGPU sets to workaround a bug, it is ignored if HDF5 is not chosen for writing
Ah, thanks @franzpoeschel , I was not aware. Could you have a look and suggest how to investigate this issue?
It should actually require a bit less than 2x but depends on many factors how much exactly, so 2x should be safe.
With the ADIOS2 BP4 engine, data is staged in-memory and written upon closing the time step. This means that in that backend, you should actually have at least 2 times the device memory. I can have a more detailed look later.
@sbastrakov it actually works, the problem was linked to the proper configuration of ADIOS backend !
I have this warning tough in the .err output, may be you know what this means?
Warning: ADIOS2 backend does not support compression method blosc. Continuing without compression.
I believe it just means your json requests data compression via blosc, that is supported by ADIOS1 backend, but not ADIOS2. So you are currently running with ADIOS2 and no compression, but that is fine. In this case, you could remove the blosc configuration from json, but it also doesn't hurt now besides this warning printed.
I believe it just means your json requests data compression via blosc, that is supported by ADIOS1 backend, but not ADIOS2
More precisely, it is supported by ADIOS2, but only if ADIOS2 is compiled against blosc. If it's not, this warning will come up.
@sbastrakov it will be advantageous to use data compression in our case, we are using a Lustre filesystem. What does exaclty blosc ? can ADIOS2 use other kind of compression algorithm?
Unfortunately, I do not have experience with that. I can only suggest looking at openPMD API installation options, as I see from there, a few compression engines are supported but most marked experimental. Perhaps you could ask at their github about current state and practical side of using those, or maybe there are already related github discussions there.
Oh but just to give a general idea. From your workflow of using PIConGPU and postprocessing the results compression should be completely transparent. So one works with those files same way with and without compression. It just requires that openPMD API on both sides is compiled with support for that compression. So i would describe it more as advanced feature to optimize one's work, not as a functional difference.
Any idea why openPMD output trigger an out-of-memory ?
With your current configuration, you are using the BP4 engine of ADIOS2. With this engine, you should specify the InitialBufferSize to at least the size of a single output step (per MPI process). See here for more detailed info of memory usage of BP4.
I see now that the LaserWakefield example does unfortunately not include how to do that. On the documentation page for the openPMD plugin, you can look it up however. In short, this is configured in openPMD via JSON, which is unfortunately a bit ugly to write with TBG:
TBG_openPMD="--openPMD.period 100 \
--openPMD.file simOutput \
--openPMD.ext bp \
--openPMD.json '{ \
\"adios2\": { \
\"engine\": { \
\"parameters\": { \
\"InitialBufferSize\": \"2GB\" \
} \
} \
} \
}'"
Specifying this correctly should improve memory usage.
Note that you need to specify this separately for --openPMD.json and --checkpoint.openPMD.json.
For compression: Running cmake . in the build folder of ADIOS2 should give you the available options. The documentation for the openPMD plugin in PIConGPU linked above includes an example how to activate compression. I'm a bit confused why you would receive warnings that blosc is not available, nowhere in your config do you request it?
We should add that stuff to our .cfg's.
Once BP5 stabilizes, we might want to go for BP5 as the default engine and use BP4 instead as a high-performance output engine only used if
- The user knows what he's doing and knows how to configure BP4 to actually be performant
- The user knows that the system has the memory that BP4 needs
For compression: Running
cmake .in the build folder of ADIOS2 should give you the available options. The documentation for the openPMD plugin in PIConGPU linked above includes an example how to activate compression. I'm a bit confused why you would receive warnings that blosc is not available, nowhere in your config do you request it?
Actually i requested it from the json definition:
TBG_ADIOS2_configuration="'{ \
\"adios2\": { \
\"dataset\": { \
\"operators\": [ { \
\"type\": \"blosc\" \
, \"parameters\": { \
\"clevel\": \"1\" \
, \"compressor\": \"zstd\" \
, \"doshuffle\": \"BLOSC_BITSHUFFLE\" \
} \
} ] \
} \
, \"engine\": { \
\"type\": \"file\" \
, \"parameters\": { \
\"BufferGrowthFactor\": \"1.1\" \
, \"InitialBufferSize\": \"32GB\" \
, \"AggregatorRatio\" : \"1\" \
} \
} \
} \
}'"
I can just remove the line with blosc right?
@denisbertini I think so. Just to reiterate, for your previous run it would just silence the warning. The compression was not used anyways there as openPMD + ADIOS2 build you used then didn't support it
OK so removing the blosc part create now other warnings
New configs:
TBG_ADIOS2_configuration2="'{ \
\"adios2\": { \
\"dataset\": { \
\"engine\": { \
\"type\": \"file\" \
, \"parameters\": { \
\"BufferGrowthFactor\": \"1.1\" \
, \"InitialBufferSize\": \"32GB\" \
, \"AggregatorRatio\" : \"1\" \
} \
} \
} \
} \
}'"
Warnings:
Warning: parts of the JSON configuration for ADIOS2 remain unused:
{"dataset":{"engine":{"parameters":{"AggregatorRatio":"1","BufferGrowthFactor":"1.1","InitialBufferSize":"32GB"},"type":"file"}}}
``
Oh, I guess one should remove the whole hierarchy if it becomes empty. So in your case also remove the line \"dataset\": { \. Sorry, I'm not really experienced with that.
Right !, it works thanks ! No warnings anymore !
@denisbertini Do you have any further questions/problems? If not please remember to close the issue ;)
No thanks, thanks for the info,