pytorch-deeplab-xception
pytorch-deeplab-xception copied to clipboard
Sync Batchnorm in inference
Hi @jfzhang95,
When I trigger sync batchnorm, the training works fine, but the inference is quite weird. Loss in the inference is way too large. When you run your benchmark, did you use Sync Batchnorm?
Thanks.
recently I had a similar issue (but I used batchnorm2d due to 1gpu), the possible reason may be due to small size of batch size. reference: https://discuss.pytorch.org/t/performance-highly-degraded-when-eval-is-activated-in-the-test-phase/3323
Hi @jfzhang95,
When I trigger sync batchnorm, the training works fine, but the inference is quite weird. Loss in the inference is way too large. When you run your benchmark, did you use Sync Batchnorm?
Thanks.
Hi Do you now fix this problem? when i use 1 GPU, validation IoU is fine, always increasing. But when I use multiple GPUs with sync batchnorm, it drops dramatically after the first few epochs. But this only happens when i switch on use-sbd, it does not happen when i switch off use-sbd. Looks for big dataset with sbd, multiple GPU does not work in validation. The training loss looks fine ath.