picongpu
picongpu copied to clipboard
Erroneous checkpoint restart of radiation plug-in
I have recently observed that a restart from a checkpoint appears to zero-out the amplitudes accumulated in the radiation simulations (or the restore is incorrect for the plug-in).
Specifically:
Over the weekend I have run a simulation (electron bunch impacting a uniform background) using 59e9b53605f9a5c1bf271eeb055bc74370a99052
with radiation module enabled, with checkpoint restarts at steps 36k, 48k, 60k.
In the following figure you see, on the left, the squares of the amplitude vectors.
At the given time steps one observe clear "cuts" in the spectrum, something I did not expect since evaluation of older data obtained using the same evaluation routines did not display such cuts.
If one integrates over the frequency the following, sawtooth-like, curve is obtained:
I'm guessing that the load of checkpointed radiation data is erroneous.
@Anton-Le Which version of PIConGPU are you using and what did you adjust in the code?
A restart bug could have been introduced with the switch to the openPMD-api briefly before the 0.6.0 release.
Could you also please past your stdout
and stderr
into this issue. The radiation plugin is quite verbose regarding restarts and tells you what it found or did not find as a restart file. Could you also list all files in your simOutput/checkpoints/
directory.
The PIC version is the commit noted in the opening post - the 0.6.0 release as it is in master
(https://github.com/ComputationalRadiationPhysics/picongpu/commit/59e9b53605f9a5c1bf271eeb055bc74370a99052 )
No further changes to the code were made and, since I did not change the verbosity level of the radiation plug-in from its default settings there is nothing in stdout
/stderr
that looks out of the ordinary:
new grid size (global|local|offset): {560,2304,552}|{280,144,276}|{0,1584,276}
PIConGPUVerbose PHYSICS(1) | used Random Number Generator: RNGProvider3XorMin seed: 42
PIConGPUVerbose PHYSICS(1) | Field solver condition: c * dt <= 1.1285 ? (c * dt = 1)
PIConGPUVerbose PHYSICS(1) | Resolving plasma oscillations?
Estimates are based on DensityRatio to BASE_DENSITY of each species
(see: density.param, speciesDefinition.param).
It and does not cover other forms of initialization
PIConGPUVerbose PHYSICS(1) | species b: omega_p * dt <= 0.1 ? (omega_p * dt = 0.0438178)
PIConGPUVerbose PHYSICS(1) | species e1: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e2: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e3: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | species e4: omega_p * dt <= 0.1 ? (omega_p * dt = 0.004)
PIConGPUVerbose PHYSICS(1) | macro particles per device: 111283200
PIConGPUVerbose PHYSICS(1) | typical macro particle weighting: 211.69
PIConGPUVerbose PHYSICS(1) | UNIT_SPEED 2.99792e+08
PIConGPUVerbose PHYSICS(1) | UNIT_TIME 7.09036e-17
PIConGPUVerbose PHYSICS(1) | UNIT_LENGTH 2.12564e-08
PIConGPUVerbose PHYSICS(1) | UNIT_MASS 1.92837e-28
PIConGPUVerbose PHYSICS(1) | UNIT_CHARGE 3.39165e-17
PIConGPUVerbose PHYSICS(1) | UNIT_EFIELD 2.40398e+13
PIConGPUVerbose PHYSICS(1) | UNIT_BFIELD 80188.2
PIConGPUVerbose PHYSICS(1) | UNIT_ENERGY 1.73313e-11
PIConGPUVerboseRadiation SIMULATION_STATE(2) | Radiation (b): restart finished
PIConGPUVerboseRadiation SIMULATION_STATE(2) | Radiation (e1): restart finished
initialization time: 1min 4sec 291msec = 64.291 sec
[.. further standard output of a normal rad-module iteration ..]
As for the contents of the folder, here you go: Bugreport_ContentsOfCheckpointFolder.txt
I have requeued the simulation, since I messed up the detector distribution in the first one once it is done I could check the continuation to see whether the problem is reproducible.
I can reproduce the problem using the current master
PIConGPU 0.6.0.
Based on the Bunch example case 5 (single particle case) I ran the 32.cfg with restart at iteration 2900.
The radiation energy evolution of both the openPMD and text based output agree and both show the loss of data after the restart. Without a restart, the expected energy evolution (without a drop to zero) is observed.
This is a (now confirmed) bug of the restart capability of the radiation plugin. A back-port will be needed as soon as a fix is out.
The checkpoint is valid and contains all needed data. ~~(Todo for myself: check z-amplitude with different case)~~ EDIT: z-polarization is stored fine in the openPMD output The error occurs most likely while reading the checkpoint.
@Anton-Le Please see the pull request above. There I explain how you can still use your data despite PIConGPU not finding your restart files.