courses
courses copied to clipboard
Use set_value to set the learning rate
If lr is set via normal assignment, then we get the following error later in the notebook when attempting to call set_value on it:
AttributeErrorTraceback (most recent call last)
<ipython-input-68-c065d419a30d> in <module>()
----> 1 model.optimizer.lr.set_value(0.000001)
AttributeError: 'float' object has no attribute 'set_value'
Alternatively, I guess any calls to set_value could be converted to normal assignment.
I found it was better to use normal assignment. I don't know why it matters at all, but for some reason I got better results.
Ok. Maybe I'll submit another pull-request converting all the set_value calls to normal assignment then? The point of this pull-request was that in this notebook, calls to set_value are interleaved between setting the learning rate via normal assignment, which results in an AttributeError when a user tries to run one of the cells that contains a set_value call. The reason is that when setting the learning rate via normal assignment, the type of model.optimizer.lr is no longer TensorSharedVariable but just a regular float, which does not have a set_value method.
It's funny that you mention getting better results with normal assignment. Anecdotally, for this particular notebook, I got better results with set_value. That is, using normal assignment for lowering the initial learning rate consistently caused my model to get "stuck" in a state where it always predicted a space as the next character for any sequence of characters (space is by far the the most common char in the input). As soon as I switched to lowering the initial learning rate via a call to set_value, the model predictions improved and I started getting results similar to what appears in the notebook. I didn't mention it in the initial pull request as I assumed it was just due to a lucky weight initialization.
Huh. Well that's worth looking into before we pick an approach. On the forums we had a discussion and I found the opposite in some (very non-rigourous) experiments. Are you using keras 1 and theano? If so, it would be great to try the two approaches on a few of the models used in the course...
I'm using the ami from the course on a p2-xlarge instance and haven't run conda update or anything. conda list says I have keras 1.1.0 and theano 0.8.2.
Any thoughts on how to test this? Would it be sufficient to create two identical models and use the same initial weights & biases for both? Then set learning rate in Model A via lr=rate and in Model B via lr.set_value(rate), then fit both on the same data and look for differences in the loss between the two models? Maybe rinse and repeat with different values for learning rate, optimization algorithm, and say one cnn model and one rnn model? Does that sound reasonable?
Probably easiest is just to try it in a 2-3 of the course's notebooks, using whatever settings happen to be used there. That way we know how it's impacting the actual notebooks in the course. We wouldn't want the students to see worse results than I show in the videos!
Ok, I'll give that a try and circle back later.
Took me a while, but I finally have some results to share.
Full details can be found in a jupyter notebook posted here: ma-learning-rate-test.ipynb. That notebook is quite large (~17 MB), so I also posted just the overview section, which contains most of the relevant information.
tl;dr
- It looks like keras/theano silently ignores any
lr=valueassignments made after the first training pass. - For learning rate changes made before the first training pass,
lr=valueandlr.set_value(value)appear to be equivalent. - Keras optimizer classes don't provide any
@propertysetter for thelrattribute, so thatlr=valueassignments clobber the Theano shared variable that thelrattribute normally holds and replace it with a barefloat, which means those changes aren't picked up by the compiled training function. - Keras lazily compiles the training function the first time you call
fit, so anylr=valuechanges made before the first training pass do make it into the compiled function. - Most of the models I tested from the fastai course notebooks got better loss/accuracy scores when using
lr=valueassignments, as opposed tolr.set_value. These tests all use the same sequence of learning rate changes found in the course notebooks. Many of the notebooks only change the learning rate once, prior to training, in which case the two methods seem to give equivalent results (at least for loss/accuracy metrics). The one exception I found (didn't test all models), was the Nietzsche "3-char" model, i.e. the model that started this thread, which got better results usinglr.set_value.