DRN-WSOD-pytorch
DRN-WSOD-pytorch copied to clipboard
What is meaning of MEAN_LOSS = False | True
What's effective loss scaling? Does it sum or mean over classes? over batch size?
How does it interact with distributed training? Is there anywhere scaling over the world size?
Basically I'd like to understand batch size and averaging for https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/DRN-WSOD/projects/WSL/configs/PascalVOC-Detection/wsddn_WSR_18_DC5_1x.yaml and the 4-gpu setup mentioned in README:
From what I understand, it uses 4 gpus, but also 4 images per batch, that is 1 image per GPU, it sums over classes, and uses 32 iter size. Which means that effective batch size is 4*32 = 128 (then we sum the loss over the classes and average over the batch size). Is it correct?
What is the logic/motivation of using MEAN_LOSS or not?
E.g. https://github.com/shenyunhang/DRN-WSOD-pytorch/blob/ff6168effcff68a77fbd6576ce108726ff14034c/projects/WSL/configs/PascalVOC-Detection/oicr_WSR_50_DC5_1x.yaml uses MEAN_LOSS=True (opposite to wsddn_WSR_18_DC5_1x.yaml )
Thanks!
In MEAN_LOSS = True setting, it seems that loss will be divided by batch size twice
I guess for batch_size equals 1 (which was your practical setting), there is no difference
MEAN_LOSS is designed to norm loss over classes. For wsddn and contextlocnet, we use 128 batch size, and do not norm loss over classes, i.e., MEAN_LOSS = False. For other methods, we use 4 batch size, and norm loss over classes, i.e., MEAN_LOSS = True.
In fact, we find use large batch size without normlization over classes has better performance, but requires very long epoch (we use 160 epoch for wsddn and contextlocnet). To save time, in other methods, we only use samll batch size with normlization over classes to speed up convergence. And wsddn module in other methods has worse performance than standalone wsddn.
What's effective loss scaling? Does it sum or mean over classes? over batch size?
How does it interact with distributed training? Is there anywhere scaling over the world size?
This project need manully modify the config file to scale-up training. But UWSOD project implements auto scale at here . It can be actived by setting REFERENCE_WORLD_SIZE as this config file.
In MEAN_LOSS = True setting, it seems that loss will be divided by batch size twice
This is a bug. When each gpu has one image, there is no difference. That is why we did not find this bug before.