tangent tangent.grad_dot fails with (3,) (3,) arguments

Once I produce my gradient function with tangent.grad, calling the function fails with the following error

/shared/sdoerr/Software/miniconda3/lib/python3.6/site-packages/tangent/utils.py in grad_dot(dy, x1, x2)
    773       numpy.sum(x2, axis=tuple(numpy.arange(numpy.ndim(x2) - 2)))))
    774   dy_x2 = numpy.sum(dy, axis=tuple(-numpy.arange(numpy.ndim(x2) - 2) - 2))
--> 775   return numpy.reshape(numpy.dot(dy_x2, x2_t), numpy.shape(x1))
    776 
    777 

ValueError: shapes (1,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

ipdb> x1
array([ 0.63199997, -0.01399994,  1.66399956])
ipdb> x2
array([1.32600021, 1.09599972, 0.45800018])
ipdb> dy
array([[0.00041678]])

Had to do: np.dot(x[jj], np.reshape(x[kk], (-1, 1))) to fix it. Not a huge issue but it could confuse users.

Feb 23 '18 11:02 stefdoerr

Do you have the original code and arguments you used for gradients?

A possible cause might be that tangent.grad assumes scalar output - can you try to see if using tangent.autodiff works instead (that will require you to supply an initial gradient value, if it's not scalar).

Feb 24 '18 13:02 mdanatg

import tangent
import numpy as np

def test(x, y):
    return np.dot(x, y)

xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(1, 3))

Feb 26 '18 08:02 stefdoerr

I see - the error originates from np.dot, the matrix sizes don't align properly. For correct matrix multiplication, the call should be xxx(np.random.rand(1, 3), np.random.rand(1, 3)), as you mentioned.

For example, the following code fails with the same error:

import numpy as np

def test(x, y):
  return np.dot(x, y)

test(np.random.rand(1, 3), np.random.rand(1, 3))

That said, I think it would be useful to wrap errors and more clearly indicate when an error originates in the forward code.

Feb 26 '18 14:02 mdanatg

Oh right, sorry. Well you can still make it work with numpy but fail with tangent even though it becomes a different error now

import tangent
import numpy as np

def test(x, y):
    return np.dot(x[0, :], y[0, :])

test(np.random.rand(1, 3), np.random.rand(1, 3)) #runs
xxx = tangent.grad(test) # doesnt
xxx(np.random.rand(1, 3), np.random.rand(1, 3))

Feb 26 '18 16:02 stefdoerr

Yes, it seems that we have a bug in the handling of slice operators. This might be insufficient for your immediate needs, but it should run:

def test(x, y):
    return np.dot(x, y)[0, 0]  # Use matrix multiply instead of inner product

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(3, 1))

Alternatively, you could use tangent.autodiff which does not assume the result is a scalar. Implementation-wise they are very similar, but tangent.autodiff is more technically correct in that case:

def test(x, y):
    return np.dot(x, y)  # Result is a 1 x 1 matrix

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.autodiff(test)
# Add a third parameter for the gradient seed, which matches the shape of f's result.
xxx(np.random.rand(1, 3), np.random.rand(3, 1), np.ones((1, 1)))

Feb 26 '18 17:02 mdanatg

Opened #62 for the slice issue.

Feb 26 '18 17:02 mdanatg

tangent tangent copied to clipboard

tangent.grad_dot fails with (3,) (3,) arguments

tangent
tangent copied to clipboard