NTM-tensorflow
NTM-tensorflow copied to clipboard
Loss sometimes goes to nan even with the gradient clipping
Didn't figure out why yet and any advice for this is welcome!
not sure if it’s related but do softmax(xxxx + eps)
On 31 Dec 2015, at 23:13, Taehoon Kim [email protected] wrote:
Didn't figure out why yet and any advice for this is welcome!
— Reply to this email directly or view it on GitHub https://github.com/carpedm20/NTM-tensorflow/issues/2.
@jli05 Thanks! I'll try it. I could only learn NTM with max_length=10
since now without nan loss. If it becomes more than 10
, I think we need more than 100000
epochs which is different from referenced code.
@carpedm20 in my NTM implementation (and in a couple of others I saw out there) nan
s were usually caused by one of the following:
- Initializing the memory to zero. The memory appears in the denominator of the cosine distance and that makes it
nan
. Check if that is not your case and possibly add a small constant in the denominator and avoid initializing the memory to all zeros (make it a small constant). - negative sharpening value. that creates a complex number and also makes the cost function go
nan
I think there was a third case but I don't remember right now. Good luck debugging! :D
@EderSantana could you explain what is the meaning of negative sharpening value? Thanks
the sharpening
value is is uses as pow(input, sharpening)
. So it can't be negative. Use a nonlinearity like softplus to avoid getting negative values: sharpening = tf.nn.softplus(sharpening)
.
Having a negative sharpening value wouldn't make a real become imaginary. But in the paper Graves explicitly states that the sharpening value is >= 1, so softplus(gamma) + 1 would work fine.
a^(-b) = 1/(a^b)