Batch size ablation results

Open Yingdong-Hu opened this issue 4 years ago • 2 comments

Hello, thanks for your great work. Can you provide additional ablations obtained using different batch size ? (e.g. smaller batch size 512/256, instead of the 1024 reported in paper) I vary the training batch size but I find that the final result vary a lot.

Mar 31 '21 03:03 Yingdong-Hu

Hi, @Alxead . From our experience, a "sqrt" scheduling method should be used to adjust the learning rate. As our default setting, the learning rate for batch size 1024 is: 1024 / 256 * 1 = 4. With sqrt scheduling, the learning rate for batch size 512 should be: 4 * sqrt(512 / 1024) = 2.828. We can modify the train script with '--base-lr 1.414' to achieve this.

Apr 04 '21 12:04 impiga

Hi, Thank you for your contribution. I was thinking that did you use learning rate decay as the learning rate is quite high and it should reduce as network converges. Thanks, Ram

Mar 02 '22 14:03 ramchandracheke