fix wandb lagging at end of ddp training

Open mnoukhov opened this issue 3 years ago • 0 comments

Before submitting

[x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
[x] Did you read the contributor guideline?
[x] Did you make sure to update the docs? n/a
[x] Did you write any new necessary tests? n/a

What does this PR do?

Fixes #4619 in a not great way

We need to call wandb.finish() at the end of our code to let wandb know that the multiprocessing job is over. But this is non-trivial with the current setup of progress_bar.py. I tried adding wandb.finish() similar to how tensorboard writers are closed using atexit (see here ) but it doesn't work.

The current solution adds it in fairseq_cli.train but if there is a more elegant solution that uses progress_bar.py I would be happy to change

Jul 29 '22 23:07 mnoukhov

fairseq fairseq copied to clipboard

fix wandb lagging at end of ddp training

Before submitting

What does this PR do?

fairseq
fairseq copied to clipboard