marian-dev
marian-dev copied to clipboard
Validation scores in *.progress.yml get saved before validation
Bug description
model.npz.progress.yml contains validation results from the last validation step, but we call validation after saving, so we are always one step behind when saving and validation frequency overlap. If they don't overlap that matters a bit less, but usually we will overlap.
How to reproduce
Any training
Context
- Marian version: all current versions
- CMake command: Doesn't matter
- Log file: N/A
Easiest solution is to validate first, and save model and checkpoint later. Just switch around order in code for all graph groups.
I vaguely recall that we have changed that order to the existing one, but I do not remember the reason for this. The issue with the save-validate order is that model.npz.progress.yml has outdated scores or there are other drawbacks? This should already work even if valid-freq == save-freq.
@emjotde memo to self, check what is going on here after graph-group refactors.