marian-dev icon indicating copy to clipboard operation
marian-dev copied to clipboard

Validation scores in *.progress.yml get saved before validation

Open emjotde opened this issue 5 years ago • 2 comments
trafficstars

Bug description

model.npz.progress.yml contains validation results from the last validation step, but we call validation after saving, so we are always one step behind when saving and validation frequency overlap. If they don't overlap that matters a bit less, but usually we will overlap.

How to reproduce

Any training

Context

  • Marian version: all current versions
  • CMake command: Doesn't matter
  • Log file: N/A

Easiest solution is to validate first, and save model and checkpoint later. Just switch around order in code for all graph groups.

emjotde avatar Apr 13 '20 01:04 emjotde

I vaguely recall that we have changed that order to the existing one, but I do not remember the reason for this. The issue with the save-validate order is that model.npz.progress.yml has outdated scores or there are other drawbacks? This should already work even if valid-freq == save-freq.

snukky avatar Apr 13 '20 10:04 snukky

@emjotde memo to self, check what is going on here after graph-group refactors.

emjotde avatar Nov 10 '20 02:11 emjotde