Ranger21 decouple the lr scheduler and optimizer?

decouple the lr scheduler and optimizer?

Open hiyyg opened this issue 2 years ago • 5 comments

Hi @lessw2020, thanks for the very nice work! I noticed that in this Ranger21, the optimizer is tightly coupled with the lr scheduler, could you guide me how I can decouple them?

Nov 01 '21 03:11 hiyyg

I would like to second this. A split in ranger optimizer and ranger scheduler would be really cool.

Nov 02 '21 10:11 neuronflow

Hi @hiyyg and @neuronflow, Right now you can turn off the built in lr scheduling by turning off both warmup and warmdown: use_warmup=False warmdown_active=False that should simply pass through the input lr and not touch it. Is that what you mean by decouple? Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

Nov 02 '21 19:11 lessw2020

Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

This is what I initially had in mind. Maybe, just maybe Ranger optimizer should go hand in hand with Ranger scheduler following the standard pytorch conventions?

Nov 02 '21 19:11 neuronflow

Hi @lessw2020, apparently in this current implementation there is no way to have different parameters learn using different learning rates. Did I get it right?

If this were available, I would love to use it. Two use cases are the following:

Fine tuning a network where layers closer to the head have a higher lr;
My case: I train a graph neural network, and I need the embeddings to have 100x learning rate of the model, but in this current script I cant use the standard pytorch way of doing it:

model_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == False]
emb_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == True]
optimizer_model = madgrad_wd([{'params': emb_params, 'lr': self.model_config['emb_max_lr']},
                          {'params': model_params, 'lr': self.model_config['model_max_lr']}], weight_decay=self.model_config['wd'])

Mar 10 '22 18:03 felipemello1

Hi @fmellomascarenhas, @neuronflow and @hiyyg - fully agree with all the points above (decoupled scheduler and parameter groups. This split between scheduler and optimizer will happen for Ranger22 (the 2022 edition lol).
Should have more info and updates shortly, as we just agreed last night to go ahead with the Ranger22 version.

Mar 11 '22 21:03 lessw2020

Ranger21 Ranger21 copied to clipboard

decouple the lr scheduler and optimizer?

Ranger21
Ranger21 copied to clipboard