WarpX icon indicating copy to clipboard operation
WarpX copied to clipboard

Deuterium_Tritium_Fusion_3D test fails

Open RTSandberg opened this issue 2 years ago • 7 comments

Running on MacOS, this test fails with the following messages:

ERROR: Benchmark and plotfile checksum have different value for key [helium1,particle_cpu]
Benchmark: [helium1,particle_cpu] 2.056400000000000e+04
Plotfile : [helium1,particle_cpu] 2.029200000000000e+04
Absolute error: 2.72e+02
Relative error: 1.32e-02
...
ERROR: Benchmark and plotfile checksum have different value for key [helium1,particle_momentum_x]
Benchmark: [helium1,particle_momentum_x] 1.751971649183954e-15
Plotfile : [helium1,particle_momentum_x] 1.742140437206331e-15
Absolute error: 9.83e-18
Relative error: 5.61e-03

There are more than 20 such instances

@NeilZaim could you take a look?

RTSandberg avatar Aug 04 '22 18:08 RTSandberg

I'll exclude cpu/id in #2924 for different reasons.

particle_momentum_x et al. looks like a physics reproducibility bug in the test @NeilZaim

ax3l avatar Aug 04 '22 19:08 ax3l

Thanks for reporting this, I'll try to have a look in the coming days.

Do you see this issue only with the Deuterium Tritium test, or do you see something similar with the Proton Boron test?

NeilZaim avatar Aug 04 '22 19:08 NeilZaim

Note that when I run any CI tests on my Mac, they almost all fail now, with the same kind of issues, small differences in the cpu, id and other particle parameters. I traced it down to differing random number seeds. There is something happening on the Mac that the seeds are getting wrong values when running the test cases.

dpgrote avatar Aug 05 '22 17:08 dpgrote

Could be a difference in the seed or the std::mt19937 details of the different stdlibs on various platforms, which AMReX uses.

  • https://stackoverflow.com/questions/45766536/platform-dependent-state-of-mt19937-in-c

Maybe @atmyers and @WeiqunZhang can chime in?

ax3l avatar Aug 06 '22 23:08 ax3l

If the seeds are the same, std::mt19937 will produce the same random number sequences. The stackoverflow question was about the internal state of mt19937, not the output.

WeiqunZhang avatar Aug 08 '22 16:08 WeiqunZhang

Note that I don't think we set the seed in the nuclear fusion tests, I don't know if that could fix the issue.

NeilZaim avatar Aug 08 '22 16:08 NeilZaim

If the seed is not explicitly set, amrex will use a deterministic seed.

WeiqunZhang avatar Aug 08 '22 16:08 WeiqunZhang