Auto scale learning rate based on batch size

Open vreis opened this issue 6 years ago • 1 comments

🚀 Feature

Motivation

Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size

Pitch

ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.

Alternatives

Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.

Dec 05 '19 16:12 vreis

Interpolation does not support arithmetic operations (there is an enhancement request in OmegaConf that I will consider in the future).

For now, you could use to get the batch size into the model, and do the auto scaling in code.

model:
   params:
      ...
      batch_size: ${batch_size}

and do the division in the code.

Jan 09 '20 22:01 omry