mmpretrain [Feature] Improve documentation for auto_scale

[Feature] Improve documentation for auto_scale_lr

Open crypdick opened this issue 1 year ago • 2 comments

Describe the feature

There are over 300 usages of the auto_scale_lr in the mmpretrain configs and docs, however there is no explicit documentation on how to use it. If my interpretation of the code is correct (below), many of the example configs are setting the optimizer lr incorrectly. Also, searching through the issues for auto_scale_lr shows that many users are also misconfiguring this setting.

The upstream mmengine repo does provide some API docs here, however it does not state how to set base_batch_size.

I read the code for autoscaling the lr and Goyal et al, and it appears that the correct usage is as follows:

auto_scale_lr base_batch_size and dataloader batch_size should be set to the same value, AKA the mini-batch size
set optimizer lr to a constant (not scaled by the minibatch size or effective batch size)

Then, internally mmpretrain will scale the LR by the ratio of effective batch size (i.e. minibatch size * num replicas) to the mini batch size.

Will you implement it?

[ ] I would like to implement this feature and create a PR!

Aug 11 '23 19:08 crypdick

If I am reading it correctly, I think auto_scale_lr is also not accounting for gradient accumulation with accumulative_counts

Aug 18 '23 13:08 crypdick

any progress? Did you solve the problem?

Jan 04 '24 01:01 LeoAKALiu

mmpretrain mmpretrain copied to clipboard

[Feature] Improve documentation for auto_scale_lr

Describe the feature

Will you implement it?

mmpretrain
mmpretrain copied to clipboard