pytorch-deeplab-xception icon indicating copy to clipboard operation
pytorch-deeplab-xception copied to clipboard

Sync Batchnorm in inference

Open AlanStark opened this issue 6 years ago • 2 comments

Hi @jfzhang95,

When I trigger sync batchnorm, the training works fine, but the inference is quite weird. Loss in the inference is way too large. When you run your benchmark, did you use Sync Batchnorm?

Thanks.

AlanStark avatar Jan 28 '19 02:01 AlanStark

recently I had a similar issue (but I used batchnorm2d due to 1gpu), the possible reason may be due to small size of batch size. reference: https://discuss.pytorch.org/t/performance-highly-degraded-when-eval-is-activated-in-the-test-phase/3323

tsing90 avatar May 11 '19 15:05 tsing90

Hi @jfzhang95,

When I trigger sync batchnorm, the training works fine, but the inference is quite weird. Loss in the inference is way too large. When you run your benchmark, did you use Sync Batchnorm?

Thanks.

Hi Do you now fix this problem? when i use 1 GPU, validation IoU is fine, always increasing. But when I use multiple GPUs with sync batchnorm, it drops dramatically after the first few epochs. But this only happens when i switch on use-sbd, it does not happen when i switch off use-sbd. Looks for big dataset with sbd, multiple GPU does not work in validation. The training loss looks fine ath.

zwxu064 avatar Feb 08 '20 03:02 zwxu064