mmpretrain
mmpretrain copied to clipboard
[Feature] Improve documentation for auto_scale_lr
Describe the feature
There are over 300 usages of the auto_scale_lr
in the mmpretrain configs and docs, however there is no explicit documentation on how to use it. If my interpretation of the code is correct (below), many of the example configs are setting the optimizer lr incorrectly. Also, searching through the issues for auto_scale_lr
shows that many users are also misconfiguring this setting.
The upstream mmengine
repo does provide some API docs here, however it does not state how to set base_batch_size
.
I read the code for autoscaling the lr and Goyal et al, and it appears that the correct usage is as follows:
- auto_scale_lr
base_batch_size
and dataloaderbatch_size
should be set to the same value, AKA the mini-batch size - set optimizer
lr
to a constant (not scaled by the minibatch size or effective batch size)
Then, internally mmpretrain will scale the LR by the ratio of effective batch size (i.e. minibatch size * num replicas
) to the mini batch size.
Will you implement it?
- [ ] I would like to implement this feature and create a PR!
If I am reading it correctly, I think auto_scale_lr
is also not accounting for gradient accumulation with accumulative_counts
any progress? Did you solve the problem?