speechbrain icon indicating copy to clipboard operation
speechbrain copied to clipboard

Fix tensorboard logger

Open TParcollet opened this issue 3 years ago • 5 comments

Our current tensor board logger is not DDP compliant and can't be resumed if the resume crash or if we restart from a different epoch. This PR fixes all this.

TParcollet avatar Mar 30 '22 15:03 TParcollet

@Gastron may I ask you to review this PR? It is linked to checkpointification of one of the loggers. To test all the recipes that are impacted, however, I don't know much how to proceed as I don't have the data. I think that @mravanelli has some ? Could you please try only on UrbanSound or Voicebank ? The changes are the same for all recipes, so if it fails, it will fail for all of them.

TParcollet avatar Mar 30 '22 15:03 TParcollet

@TParcollet can we merge it or it is better to do another review here?

mravanelli avatar May 22 '22 03:05 mravanelli

This changes quite a lot of recipe. It needs to be tested again on all of them (I did mine, but it must be reviewed).

TParcollet avatar May 22 '22 07:05 TParcollet

@anautsch, could you do it when you have time? It could be great to add this fix in the new minor version.

mravanelli avatar May 22 '22 14:05 mravanelli

works with extra dependency: protobuf==3.20.1

protobuf is for de/serializing data structures (from Google since 2008; open-source). Going through logs which crashed, they explicitly asked for going to a 20.x protobuf version or earlier.

note: when heading for v0.6 we have a dependency & extra_dependency 'theme park' ahead of us...

anautsch avatar May 31 '22 16:05 anautsch