phantom
phantom copied to clipboard
Results differ after restarting from output
To reproduce this issue:
$PHANTOM_DIR/scripts/writemake.sh shock > Makefile
make; make setup; make diffdumps
Setup with default options:
./phantomsetup shock
Reduce resolution for quicker testing by changing nx = 32
in shock.setup
.
Finish setup:
./phantomsetup shock
Run from the start:
./phantom shock
Copy last output for reference:
cp shock_000020 shock_000020.ref
Restart from an intermediate full output by setting dumpfile = shock_00010
in shock.in
.
./phantom shock
Compare the two final outputs
./diffdumps shock_000020 shock_000020.ref
Files differ significantly:
particle IDs differ 0 times
positions differ 16416 times
smoothing lengths differ 7704 times
velocities differ 16416 times
thermal energies differ 16416 times
MAX RMS ERROR: 2.8942E-06
FILES DIFFER
How similar do you expect the results to be given machine rounding, etc...?
Same problem also occurs for HDF5 outputs, so it is not specific to native outputs.
How similar do you expect the results to be given machine rounding, etc...?
It should be possible for results to be bitwise identical after restarting from an output. Indeed, when restarting twice from the same output, the results are identical.
When running with MPI, where the order of operations differs and is non-deterministic, the results only differ by 1.e-15. So 1.e-06 for a simple test problem like this without MPI seems concerning.
about 10^-15 is the typical tolerance for diffdumps, this is something not being done right in the MPI code.
about 10^-15 is the typical tolerance for diffdumps, this is something not being done right in the MPI code.
Just to clarify, this has nothing to do with MPI. The reproducer above is compiled completely without MPI.
I only mentioned MPI to illustrate that even MPI's non-deterministic order of operations preserves results better than restarting. But of course this isn't really a meaningful comparison.
Could this be due to the "extra" derivs call that is performed by initial
?
i.e. When you restart from a dumpfile, the acceleration/force at the beginning of the step is NOT the same as the force that was calculated during the previous step (since the shock terms make the force velocity dependent)
I guess this is plausible, rms error is small after all ...
One contribution to this error is that h
is stored in single precision, so the starting point for density iterations is different after a restart. But this contribution is probably small, and even after changing h
to double precision, there is still an error of roughly 1.e-6
.
I think it is likely the extra derivs call. We have never really quantified what the acceptable tolerance is for results to be identical, 10^-6 is certainly lower than various tolerances (e.g. tolv and the h-rho iteration tolerance of 1e-4).
It would be painful to do without this as would require storing accelerations and other things in the dump file. I would be extremely reluctant to do this when the biggest limitation we have at the moment is disk space
Would you consider writing "restartable" files infrequently that contain h
in double precision and the forces? E.g. once every 24h of wall time? And perhaps it could be optional, so if reproducibility is not important but disk space is limited, it can be disabled?