tangent
tangent copied to clipboard
tangent.grad_dot fails with (3,) (3,) arguments
Once I produce my gradient function with tangent.grad, calling the function fails with the following error
/shared/sdoerr/Software/miniconda3/lib/python3.6/site-packages/tangent/utils.py in grad_dot(dy, x1, x2)
773 numpy.sum(x2, axis=tuple(numpy.arange(numpy.ndim(x2) - 2)))))
774 dy_x2 = numpy.sum(dy, axis=tuple(-numpy.arange(numpy.ndim(x2) - 2) - 2))
--> 775 return numpy.reshape(numpy.dot(dy_x2, x2_t), numpy.shape(x1))
776
777
ValueError: shapes (1,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)
ipdb> x1
array([ 0.63199997, -0.01399994, 1.66399956])
ipdb> x2
array([1.32600021, 1.09599972, 0.45800018])
ipdb> dy
array([[0.00041678]])
Had to do: np.dot(x[jj], np.reshape(x[kk], (-1, 1))) to fix it. Not a huge issue but it could confuse users.
Do you have the original code and arguments you used for gradients?
A possible cause might be that tangent.grad assumes scalar output - can you try to see if using tangent.autodiff works instead (that will require you to supply an initial gradient value, if it's not scalar).
import tangent
import numpy as np
def test(x, y):
return np.dot(x, y)
xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(1, 3))
I see - the error originates from np.dot, the matrix sizes don't align properly. For correct matrix multiplication, the call should be xxx(np.random.rand(1, 3), np.random.rand(1, 3)), as you mentioned.
For example, the following code fails with the same error:
import numpy as np
def test(x, y):
return np.dot(x, y)
test(np.random.rand(1, 3), np.random.rand(1, 3))
That said, I think it would be useful to wrap errors and more clearly indicate when an error originates in the forward code.
Oh right, sorry. Well you can still make it work with numpy but fail with tangent even though it becomes a different error now
import tangent
import numpy as np
def test(x, y):
return np.dot(x[0, :], y[0, :])
test(np.random.rand(1, 3), np.random.rand(1, 3)) #runs
xxx = tangent.grad(test) # doesnt
xxx(np.random.rand(1, 3), np.random.rand(1, 3))
Yes, it seems that we have a bug in the handling of slice operators. This might be insufficient for your immediate needs, but it should run:
def test(x, y):
return np.dot(x, y)[0, 0] # Use matrix multiply instead of inner product
test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(3, 1))
Alternatively, you could use tangent.autodiff which does not assume the result is a scalar. Implementation-wise they are very similar, but tangent.autodiff is more technically correct in that case:
def test(x, y):
return np.dot(x, y) # Result is a 1 x 1 matrix
test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.autodiff(test)
# Add a third parameter for the gradient seed, which matches the shape of f's result.
xxx(np.random.rand(1, 3), np.random.rand(3, 1), np.ones((1, 1)))
Opened #62 for the slice issue.