courses icon indicating copy to clipboard operation
courses copied to clipboard

Use set_value to set the learning rate

Open appleby opened this issue 8 years ago • 7 comments

If lr is set via normal assignment, then we get the following error later in the notebook when attempting to call set_value on it:

AttributeErrorTraceback (most recent call last)
<ipython-input-68-c065d419a30d> in <module>()
----> 1 model.optimizer.lr.set_value(0.000001)

AttributeError: 'float' object has no attribute 'set_value'

Alternatively, I guess any calls to set_value could be converted to normal assignment.

appleby avatar Jun 05 '17 05:06 appleby

I found it was better to use normal assignment. I don't know why it matters at all, but for some reason I got better results.

jph00 avatar Jun 06 '17 16:06 jph00

Ok. Maybe I'll submit another pull-request converting all the set_value calls to normal assignment then? The point of this pull-request was that in this notebook, calls to set_value are interleaved between setting the learning rate via normal assignment, which results in an AttributeError when a user tries to run one of the cells that contains a set_value call. The reason is that when setting the learning rate via normal assignment, the type of model.optimizer.lr is no longer TensorSharedVariable but just a regular float, which does not have a set_value method.

It's funny that you mention getting better results with normal assignment. Anecdotally, for this particular notebook, I got better results with set_value. That is, using normal assignment for lowering the initial learning rate consistently caused my model to get "stuck" in a state where it always predicted a space as the next character for any sequence of characters (space is by far the the most common char in the input). As soon as I switched to lowering the initial learning rate via a call to set_value, the model predictions improved and I started getting results similar to what appears in the notebook. I didn't mention it in the initial pull request as I assumed it was just due to a lucky weight initialization.

appleby avatar Jun 06 '17 17:06 appleby

Huh. Well that's worth looking into before we pick an approach. On the forums we had a discussion and I found the opposite in some (very non-rigourous) experiments. Are you using keras 1 and theano? If so, it would be great to try the two approaches on a few of the models used in the course...

jph00 avatar Jun 06 '17 19:06 jph00

I'm using the ami from the course on a p2-xlarge instance and haven't run conda update or anything. conda list says I have keras 1.1.0 and theano 0.8.2.

Any thoughts on how to test this? Would it be sufficient to create two identical models and use the same initial weights & biases for both? Then set learning rate in Model A via lr=rate and in Model B via lr.set_value(rate), then fit both on the same data and look for differences in the loss between the two models? Maybe rinse and repeat with different values for learning rate, optimization algorithm, and say one cnn model and one rnn model? Does that sound reasonable?

appleby avatar Jun 06 '17 20:06 appleby

Probably easiest is just to try it in a 2-3 of the course's notebooks, using whatever settings happen to be used there. That way we know how it's impacting the actual notebooks in the course. We wouldn't want the students to see worse results than I show in the videos!

jph00 avatar Jun 06 '17 21:06 jph00

Ok, I'll give that a try and circle back later.

appleby avatar Jun 06 '17 23:06 appleby

Took me a while, but I finally have some results to share.

Full details can be found in a jupyter notebook posted here: ma-learning-rate-test.ipynb. That notebook is quite large (~17 MB), so I also posted just the overview section, which contains most of the relevant information.

tl;dr

  1. It looks like keras/theano silently ignores any lr=value assignments made after the first training pass.
  2. For learning rate changes made before the first training pass, lr=value and lr.set_value(value) appear to be equivalent.
  3. Keras optimizer classes don't provide any @property setter for the lr attribute, so that lr=value assignments clobber the Theano shared variable that the lr attribute normally holds and replace it with a bare float, which means those changes aren't picked up by the compiled training function.
  4. Keras lazily compiles the training function the first time you call fit, so any lr=value changes made before the first training pass do make it into the compiled function.
  5. Most of the models I tested from the fastai course notebooks got better loss/accuracy scores when using lr=value assignments, as opposed to lr.set_value. These tests all use the same sequence of learning rate changes found in the course notebooks. Many of the notebooks only change the learning rate once, prior to training, in which case the two methods seem to give equivalent results (at least for loss/accuracy metrics). The one exception I found (didn't test all models), was the Nietzsche "3-char" model, i.e. the model that started this thread, which got better results using lr.set_value.

appleby avatar Jun 29 '17 02:06 appleby