DRN-WSOD-pytorch What is meaning of MEAN_LOSS = False

What's effective loss scaling? Does it sum or mean over classes? over batch size?

How does it interact with distributed training? Is there anywhere scaling over the world size?

Nov 03 '21 23:11 vadimkantorov

Basically I'd like to understand batch size and averaging for https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/configs/PascalVOC-Detection/wsddn_WSR_18_DC5_1x.yaml and the 4-gpu setup mentioned in README:

From what I understand, it uses 4 gpus, but also 4 images per batch, that is 1 image per GPU, it sums over classes, and uses 32 iter size. Which means that effective batch size is 4*32 = 128 (then we sum the loss over the classes and average over the batch size). Is it correct?

Nov 04 '21 19:11 vadimkantorov

What is the logic/motivation of using MEAN_LOSS or not?

E.g. https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/ff6168effcff68a77fbd6576ce108726ff14034c/projects/WSL/configs/PascalVOC-Detection/oicr_WSR_50_DC5_1x.yaml uses MEAN_LOSS=True (opposite to wsddn_WSR_18_DC5_1x.yaml )

Thanks!

Nov 04 '21 19:11 vadimkantorov

In MEAN_LOSS = True setting, it seems that loss will be divided by batch size twice

Nov 09 '21 13:11 vadimkantorov

I guess for batch_size equals 1 (which was your practical setting), there is no difference

Nov 09 '21 14:11 vadimkantorov

MEAN_LOSS is designed to norm loss over classes. For wsddn and contextlocnet, we use 128 batch size, and do not norm loss over classes, i.e., MEAN_LOSS = False. For other methods, we use 4 batch size, and norm loss over classes, i.e., MEAN_LOSS = True.

In fact, we find use large batch size without normlization over classes has better performance, but requires very long epoch (we use 160 epoch for wsddn and contextlocnet). To save time, in other methods, we only use samll batch size with normlization over classes to speed up convergence. And wsddn module in other methods has worse performance than standalone wsddn.

What's effective loss scaling? Does it sum or mean over classes? over batch size?

How does it interact with distributed training? Is there anywhere scaling over the world size?

This project need manully modify the config file to scale-up training. But UWSOD project implements auto scale at here . It can be actived by setting REFERENCE_WORLD_SIZE as this config file.

In MEAN_LOSS = True setting, it seems that loss will be divided by batch size twice

This is a bug. When each gpu has one image, there is no difference. That is why we did not find this bug before.

Nov 22 '21 11:11 shenyunhang

DRN-WSOD-pytorch
DRN-WSOD-pytorch copied to clipboard

What is meaning of MEAN_LOSS = False | True

DRN-WSOD-pytorch DRN-WSOD-pytorch copied to clipboard

What is meaning of MEAN_LOSS = False | True

DRN-WSOD-pytorch
DRN-WSOD-pytorch copied to clipboard