Optimization
Optimization copied to clipboard
AttributeError: 'Variable' object has no attribute 'ref'
I am seeing:
File "optimizer.py", line 59, in minimize
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "optimizer.py", line 82, in compute_gradients
var_refs = [x_tm1.ref() for x_tm1 in var_list]
AttributeError: 'Variable' object has no attribute 'ref'
Hmm, that's strange because the tensorflow docs suggest that Variable objects should have a ref attribute, but they also don't list it? The code here was originally developed for tensorflow v0.8 which is fairly out of date, so it might need to be restructured to make it compatible with v0.12.
What happens when you try to get a variable reference on the tensorflow distribution you're using? For example:
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> var = tf.Variable(np.random.randn(5))
>>> var.ref()
<tf.Tensor 'Variable:0' shape=(5,) dtype=float64_ref>
>>>
Does it raise an error or does it work?
Does not work. I get:
AttributeError: 'Variable' object has no attribute 'ref'
see: http://stackoverflow.com/questions/40901391/what-is-the-alternative-of-tf-variable-ref-in-tensorflow-version-0-12
Any reason not to extend the existing tensorflow optimizers?
At the time when I developed this, there were a couple things I wanted to build into the optimizer for the sake of convenience and so I could implement them the way I wanted to (temporal averaging, gradient clipping, and stochastic noise are what come to mind). Those things haven't changed final performance in my projects very much, so for less nit-picky people it might be easier to just extend the existing optimizers.
However, I don't think the existing optimizers do sparse updates correctly. If you're using Adam, for example, you have a couple accumulators that maintain moving averages for the parameter-wise mean and variance. When you're updating the word embedding matrix in an NLP project, there are two ways to implement the moving average--you can either decay the mean/variance averages for all the embeddings or you can only decay them for the words that actually received nonzero gradient signal. That is, if you see the sentence "I like dogs", should you treat "cats" as having zero gradient or should you leave the accumulator for that word alone? The tensorflow implementation treats the embeddings for unused words as having zero gradient, so the means and variances for them get smaller and smaller for rare words. To me it makes more sense to only update the accumulators for words that were actually used in the sentence, which required modifying the source code.
So, the short answer is, it's probably easier to just extend the existing optimizers since this codebase is a little out of date, but you might want to be careful about how you implement sparse updates.
@tdozat Any interesting submitting a PR to add nadam to TF proper as a flag on the adam optimizer? There seems like there may be some interest in this: https://github.com/tensorflow/tensorflow/issues/7715
Looks like 1.1.0 is getting this as LazyAdamOptimizer with the sparse behavior as described? https://github.com/tensorflow/tensorflow/blob/v1.1.0-rc1/tensorflow/contrib/opt/python/training/lazy_adam_optimizer.py