fairseq
fairseq copied to clipboard
fix wandb lagging at end of ddp training
Before submitting
- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the contributor guideline?
- [x] Did you make sure to update the docs? n/a
- [x] Did you write any new necessary tests? n/a
What does this PR do?
Fixes #4619 in a not great way
We need to call wandb.finish() at the end of our code to let wandb know that the multiprocessing job is over. But this is non-trivial with the current setup of progress_bar.py. I tried adding wandb.finish() similar to how tensorboard writers are closed using atexit (see here ) but it doesn't work.
The current solution adds it in fairseq_cli.train but if there is a more elegant solution that uses progress_bar.py I would be happy to change