keras-contrib icon indicating copy to clipboard operation
keras-contrib copied to clipboard

Add LearningRateMultiplier wrapper for optimizers

Open stante opened this issue 6 years ago • 6 comments

Summary

Optimizer have a model global learning rate. This PR adds a wrapper, which can be used with existing optimizers to provide a facility to specify different learning rates per layers in a network. The per layer learning rate is specified as a factor, which is multiplied with the learning rate of the wrapped optimizer. This wrapper can be used in the following way:

multipliers = {'dense_1': 0.5, 'dense_2': 0.4}
opt = LearningRateMultiplier(SGD, lr_multipliers=multipliers, lr=0.001, momentum=0.9)

The example wrappes SGD and specifies lr and momentum for it. The layer which contain the string 'dense_1' has a multiplier of 0.5 and the layer which contains the string dense_2 has the multiplier of 0.4.

Different multipliers for kernel and bias can be specified with:

multipliers = {'dense_1/kernel': 0.5, 'dense_1/bias': 0.1}

Related Issues

There are issues regarding this topic in keras https://github.com/keras-team/keras/issues/11934, https://github.com/keras-team/keras/issues/7912 and partially https://github.com/keras-team/keras/issues/5920

stante avatar Jan 07 '19 18:01 stante

It seems there is some pep8 errors and that the code isn't compatible with python 2 because of super() . Super takes two arguments in python 2. Usually it's the class and self.

gabrieldemarmiesse avatar Jan 07 '19 19:01 gabrieldemarmiesse

You can find out more about the errors by looking at the travis logs.

gabrieldemarmiesse avatar Jan 07 '19 19:01 gabrieldemarmiesse

will there be updates on this? if not can I make a new PR that adds this class to keras-contrib? @gabrieldemarmiesse @stante , will be enabling DiscriminativeLearningRate in general but not specifically only learning rate multiplier.

I propose three settings, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by layer, automatic learning rate decaying (cosine) from the base learning rate of the wrapped optimizer by convolutional blocks/groups, and this learning rate multiplier

Dicksonchin93 avatar Jan 09 '20 17:01 Dicksonchin93

Keras contrib is currently deprecated. Please redicted the PRs to tensorflow/addons. It would be really nice if you could add that @Dicksonchin93 , a lot of people are asking for this feature :)

gabrieldemarmiesse avatar Jan 09 '20 17:01 gabrieldemarmiesse

@gabrieldemarmiesse is there a reason why we shouldn't add this into keras directly?

Dicksonchin93 avatar Jan 09 '20 18:01 Dicksonchin93

This was proposed a while back and rejected. The reason is that not enough people use it to justify an API change of Keras. It's also not clear that it's a best practice. Tensorflow addons was made exactly for this kind of feature.

El jue., 9 ene. 2020 a las 19:52, Ee Kin ([email protected]) escribió:

@gabrieldemarmiesse https://github.com/gabrieldemarmiesse is there a reason why we shouldn't add this into keras directly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-contrib/pull/396?email_source=notifications&email_token=ADCLMK4BPT7YKORCSYQ5VYLQ45W5ZA5CNFSM4GOQCUKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRLZWA#issuecomment-572701912, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCLMKYTSUBDF44MQUUQLLDQ45W5ZANCNFSM4GOQCUKA .

gabrieldemarmiesse avatar Jan 09 '20 19:01 gabrieldemarmiesse