fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

fix wandb lagging at end of ddp training

Open mnoukhov opened this issue 3 years ago • 0 comments

Before submitting

  • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • [x] Did you read the contributor guideline?
  • [x] Did you make sure to update the docs? n/a
  • [x] Did you write any new necessary tests? n/a

What does this PR do?

Fixes #4619 in a not great way

We need to call wandb.finish() at the end of our code to let wandb know that the multiprocessing job is over. But this is non-trivial with the current setup of progress_bar.py. I tried adding wandb.finish() similar to how tensorboard writers are closed using atexit (see here ) but it doesn't work.

The current solution adds it in fairseq_cli.train but if there is a more elegant solution that uses progress_bar.py I would be happy to change

mnoukhov avatar Jul 29 '22 23:07 mnoukhov