snowfall
snowfall copied to clipboard
Multi-GPU training
Hi, these examples are excellent, and can we training on multi-GPUs with K2/lhotse? Just like the DDP in pytorch?
I don't believe we have examples of multi-GPU training yet, but AFAIK standard PyTorch mechanisms for multi-GPU training should work. If you try, let us know, and make a PR about it!
On Tue, Jan 19, 2021 at 4:58 PM ffhh [email protected] wrote:
Hi, these examples are excellent, and can we training on multi-GPUs with K2/lhotse? Just like the DDP in pytorch?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/76, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO5BU7THFY7ZHC7MU2TS2VCSXANCNFSM4WIOG3LA .
There is a WIP pull-request about multi-GPU training: https://github.com/k2-fsa/snowfall/pull/71
Thanks, I'll try it. @danpovey @csukuangfj
BTW I intend to finish that PR so that we can switch between single-GPU and multi-GPU training when I find some spare time. I will also need to make sure that it does the right thing when aggregating things like validation loss between the GPUs (I saw both processes return slightly different values) and storing/loading checkpoints. If you have more time and can pick it up before I do, you're welcome to do it :)
Fantastic!!
On Wed, Jan 20, 2021 at 2:36 AM Piotr Żelasko [email protected] wrote:
BTW I intend to finish that PR so that we can switch between single-GPU and multi-GPU training when I find some spare time. I will also need to make sure that it does the right thing when aggregating things like validation loss between the GPUs (I saw both processes return slightly different values) and storing/loading checkpoints. If you have more time and can pick it up before I do, you're welcome to do it :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/76#issuecomment-763040596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO6JUGID2MJESMMUBCLS2XGLFANCNFSM4WIOG3LA .