[WIP] Diagnostics: Fix Restart with Start/Stop Moving Window
After restarts, all diagnostics lo/hi bounds were not properly restored. This was specifically the case if the moving window had a non-zero start step and/or a stop step before the checkpoint step. After a restart, this causes new checkpoints and diagnostics to become corrupted, as the wrong spatial data gets filtered.
This fixes it. Existing checkpoints (from simulations that started from 0) are still readable with this fix.
Fix #6392
To Do
- [x] debug
- [x] semi-vibe debug (Cursor)
- [x] semi-vibe fix
- [ ] review & clean up
Cleanup Notice
This bug shows how risky it is to duplicate moving-window logic at multiple places. The moving/shift logic should go into functions and they should be re-used. This PR makes this anti-pattern even worse by doubling down.
A follow-up or additional commit should deduplicate the logic to make it safer: #6400
@titoiride can you potentially test this PR with your data, too? :) Please restart from a checkpoint that was written from a simulation that ran from the beginning (not a checkpoint created by a restarted simulation).
@RemiLehe
In principle, warpx.getmoving_window_x() should be able to work with starting/stopping moving window. To avoid duplicating code (as you pointed out), should we remove the function warpx.getmoving_window_x() (since it is not used anymore) or should we try to fix it and introduce it again?
I was a bit puzzled as well why I could not find a solution with warpx.getmoving_window_x. Looking at the moving window implementation in WarpX as of today, I think it has a general flaw that the diags prob domain can shift over time compared to the simulation geometry (stored among others in getmoving_window_x) because of the step-wise (accumulative) integer rounding to align with cells:
https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Diagnostics/FullDiagnostics.cpp#L1007-L1008
I think this can be fixed by fully removing the double book-keeping in FullDiagnostics::MovingWindowAndGalileanDomainShift but that will render existing restart points unusable.
In other words, WarpX::MoveWindow should be the only source of truth, but the FullDiags does its own tracking and actually is not doing it well: compared to
https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Utils/WarpXMovingWindow.cpp#L356-L376
it cumulative looses rounds on the order of a cells size, every time it updates.
While the over all moving window does the same: https://github.com/BLAST-WarpX/warpx/blob/25.11/Source/Utils/WarpXMovingWindow.cpp#L392 I think this causes an issue in the situation where the moving window is not active 100% of the sim steps.