SparseR-CNN icon indicating copy to clipboard operation
SparseR-CNN copied to clipboard

the learning rate adjustment when training with one gpu or two gpus

Open Young0111 opened this issue 3 years ago • 5 comments

Thanks for your code.

When I training this code with two GPUs (Tesla P4), changing image_per_batch 4 by running: python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 2 SOLVER.IMS_PER_BATCH 4

when it iter 7319, save the module and the AP of it is 3.915, which is different with your 11.440 (in your log).

Refer to detectron, I adjusted the learning rate to 0.0025 by running: python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 2 SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.0025

and change GPU to 1 by running: python projects/SparseRCNN/train_net.py
--config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml
--num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

but the result was worse.

Later I found that they use SGD and you use AdamW, maybe this 0.0025 is not applicable, So I want to know if I need to adjust some parameters. and how to adjust? Thanks.

Young0111 avatar Apr 01 '21 02:04 Young0111

you should set BASE_LR as 0.000025 instead of 0.0025.

iFighting avatar Apr 01 '21 02:04 iFighting

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

Young0111 avatar Apr 01 '21 02:04 Young0111

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

maybe you need a small lr than 0.000025

iFighting avatar Apr 01 '21 02:04 iFighting

OK, thanks for your reply, I will try it now.

Young0111 avatar Apr 01 '21 02:04 Young0111

So, what changes have you made? I guess do I need to set lr as 0.000025 * 1/8 and total number of iterations 8 times? However, as the number of iterations increases, the entire training duration becomes extremely large

lingl-space avatar Mar 15 '22 03:03 lingl-space