Mask R-CNN opt_base_learning_rate change from 0.02 to 0.01
in the training rule, here
The opt_base_learning_rate has been defined to be K*0.02.
| maskrcnn | sgd | opt_base_learning_rate | 0.02 * K for any integer K | base learning rate, this should be the learning rate after warm up and before decay |
|---|
It is good for systems has 4,8,16 GPUs. However, it doesn't converge well with systems has other numbers of GPUs, ex. 10 GPUs.
Another example is the RCP itself. https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_1.1.0/rcps_maskrcnn.json "maskrcnn_ref_96": { "Benchmark": "maskrcnn", "Creator": "NVIDIA", "When": "Prior to 1.0 submission", "Platform": "TBD", "BS": 96, "Hyperparams": { "opt_learning_decay_steps": [12000, 16000], "opt_base_learning_rate": 0.12, "num_image_candidates": 6000, "opt_learning_rate_warmup_factor": 0.000192, "opt_learning_rate_warmup_steps": 625 }, "Epochs to converge": [ 14, 15, 14, 14, 14, 14, 14, 14, 14, 13, 14, 14, 15, 14, 14, 14, 14, 14, 14, 14] },
"maskrcnn_ref_128": { "Benchmark": "maskrcnn", "Creator": "NVIDIA", "When": "Prior to 1.0 submission", "Platform": "TBD", "BS": 128, "Hyperparams": { "opt_learning_decay_steps": [9000, 12000], "opt_base_learning_rate": 0.16, "num_image_candidates": 6000, "opt_learning_rate_warmup_factor": 0.000256, "opt_learning_rate_warmup_steps": 625 }, "Epochs to converge": [ 14, 14, 14, 14, 14, 14, 14, 14, 14, 14] },
LR needs to be scaled with Global_batchsize, which isn't friendly for 10-GPU systems.
We had run with bs=120 and LR=0.15 on a 10-GPU system, and it converged at the same epochs(14) just like bs=96 and LR=0.12 did. Both have the same local_BS=12. Plus, in the BS=128 case defined in the same RCP, converged epoch also defined as 14.
So, we propose to have the rule on opt_base_learning_rate of Mask R-CNN adjusted from 0.02K to 0.01K. This will be more fair to systems has different numbers of GPU.
Link back to https://github.com/mlcommons/submission_training_1.1/issues/24