VanillaNet icon indicating copy to clipboard operation
VanillaNet copied to clipboard

About the batch size and optimizer of training

Open wsy-yjys opened this issue 1 year ago • 5 comments

Question 1: Will the batch size during training affect the final performance of the model? Have you tried small batch size, such as 16 and 32?

Question 2: Why do you use LAMB optimizer instead of SGD, Adamw and other more general optimizer? I use vanillanet as backbone for other downstream tasks with SGD optimizer but find it difficult to train the model from scratch,As you can see below, the first 50 epochs of model performance is basically no optimization, so I'd be grateful if you could give me some advice.

image

wsy-yjys avatar May 31 '23 05:05 wsy-yjys

Another question about the batch size, when I reduce the bacth size, does the learning rate need to be reduced correspondingly, thank you~

wsy-yjys avatar May 31 '23 07:05 wsy-yjys

Thank you for your interest in our work. For Q1 & Q3, we have only experimented with batch sizes of 1024 and 2048, and their results were similar. As the batch size varies, the learning rate needs to be adjusted accordingly, approximately by effective_lr=sqrt(effective_bs/base_bs) * base_lr. We have not attempted smaller batch sizes at this time.

As for Q2, on ImageNet, the results achieved by Adamw/LAMB is much better than SGD. The reason to finally use LAMB is similar to this paper, the LAMB is slighly better than Adamw. As for downstream tasks, I think Adamw can help improve the convergence speed of VanillaNet. You can also try to lower down the LR.

ggjy avatar Jun 02 '23 03:06 ggjy

楼主你好,请问关于Q2从头训练下游任务,使用了深度训练策略吗

DirtyBit64 avatar Jun 02 '23 10:06 DirtyBit64

@PGthree3 是的

wsy-yjys avatar Jun 03 '23 15:06 wsy-yjys

@ggjy Thank you for your reply~

wsy-yjys avatar Jun 06 '23 02:06 wsy-yjys