ResNeXt.pytorch Sublinear speed-up with dataparallel

Sublinear speed-up with dataparallel

Open grey-area opened this issue 6 years ago • 1 comments

With default arguments apart from cardinality (set to 16), I get:

On 1 1080 ti with minibatch size 20: ~9 minutes per epoch. Using dataparallel across 4 1080 ti's with minibatch size 128: ~4.5 minutes per epoch.

Perfect linear scaling would give you 2.25 minutes per epoch. Any idea what's going on here/how to get better scaling?

Feb 18 '19 11:02 grey-area

Hi!

Have you tried with a vanilla resnet? to check whether the problem is in the model or the dataloader?

Pau

Feb 19 '19 09:02 prlz77