FloWaveNet PyTorch v1.0.0 multi-GPU compatibility issue

PyTorch v1.0.0 multi-GPU compatibility issue

Open L0SG opened this issue 5 years ago • 5 comments

Currently, we cannot run the multi-GPU training on PyTorch v1.0.0 due to a strange null gradient issue.

Dec 21 '18 03:12 L0SG

Oh my God. I have trained on the multi-GPU version for one week with all of my four GPUs. In the params/flowavenet/ dir, only one checkpoint was generated.

Thanks for pointing out this.

Dec 21 '18 08:12 candlewill

Oops, sorry about the delayed issue post in this repo. Filed the report to the PyTorch repo about two weeks ago, so please stick to v0.4.1 until the issue is resolved.

Dec 21 '18 08:12 L0SG

Update: the issue still persists in the latest 1.0.1 release.

Feb 12 '19 07:02 L0SG

Note: DistributedDataParallel implementation from @1ytic circumvents the multi-GPU issue, so please use train_apex.py of the master branch until the issue from DataParallel (from train.py) is resolved.

Apr 23 '19 14:04 L0SG

Update: the issue was fixed with the 1.2.0 release. We'll keep this issue open for a while for a future reference.

Oct 10 '19 04:10 L0SG

FloWaveNet FloWaveNet copied to clipboard

PyTorch v1.0.0 multi-GPU compatibility issue

FloWaveNet
FloWaveNet copied to clipboard