tangent icon indicating copy to clipboard operation
tangent copied to clipboard

tangent.grad_dot fails with (3,) (3,) arguments

Open stefdoerr opened this issue 7 years ago • 6 comments

Once I produce my gradient function with tangent.grad, calling the function fails with the following error

/shared/sdoerr/Software/miniconda3/lib/python3.6/site-packages/tangent/utils.py in grad_dot(dy, x1, x2)
    773       numpy.sum(x2, axis=tuple(numpy.arange(numpy.ndim(x2) - 2)))))
    774   dy_x2 = numpy.sum(dy, axis=tuple(-numpy.arange(numpy.ndim(x2) - 2) - 2))
--> 775   return numpy.reshape(numpy.dot(dy_x2, x2_t), numpy.shape(x1))
    776 
    777 

ValueError: shapes (1,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

ipdb> x1
array([ 0.63199997, -0.01399994,  1.66399956])
ipdb> x2
array([1.32600021, 1.09599972, 0.45800018])
ipdb> dy
array([[0.00041678]])

Had to do: np.dot(x[jj], np.reshape(x[kk], (-1, 1))) to fix it. Not a huge issue but it could confuse users.

stefdoerr avatar Feb 23 '18 11:02 stefdoerr

Do you have the original code and arguments you used for gradients?

A possible cause might be that tangent.grad assumes scalar output - can you try to see if using tangent.autodiff works instead (that will require you to supply an initial gradient value, if it's not scalar).

mdanatg avatar Feb 24 '18 13:02 mdanatg

import tangent
import numpy as np

def test(x, y):
    return np.dot(x, y)

xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(1, 3))

stefdoerr avatar Feb 26 '18 08:02 stefdoerr

I see - the error originates from np.dot, the matrix sizes don't align properly. For correct matrix multiplication, the call should be xxx(np.random.rand(1, 3), np.random.rand(1, 3)), as you mentioned.

For example, the following code fails with the same error:

import numpy as np

def test(x, y):
  return np.dot(x, y)

test(np.random.rand(1, 3), np.random.rand(1, 3))

That said, I think it would be useful to wrap errors and more clearly indicate when an error originates in the forward code.

mdanatg avatar Feb 26 '18 14:02 mdanatg

Oh right, sorry. Well you can still make it work with numpy but fail with tangent even though it becomes a different error now

import tangent
import numpy as np

def test(x, y):
    return np.dot(x[0, :], y[0, :])

test(np.random.rand(1, 3), np.random.rand(1, 3)) #runs
xxx = tangent.grad(test) # doesnt
xxx(np.random.rand(1, 3), np.random.rand(1, 3)) 

stefdoerr avatar Feb 26 '18 16:02 stefdoerr

Yes, it seems that we have a bug in the handling of slice operators. This might be insufficient for your immediate needs, but it should run:

def test(x, y):
    return np.dot(x, y)[0, 0]  # Use matrix multiply instead of inner product

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.grad(test)
xxx(np.random.rand(1, 3), np.random.rand(3, 1))

Alternatively, you could use tangent.autodiff which does not assume the result is a scalar. Implementation-wise they are very similar, but tangent.autodiff is more technically correct in that case:

def test(x, y):
    return np.dot(x, y)  # Result is a 1 x 1 matrix

test(np.random.rand(1, 3), np.random.rand(3, 1))
xxx = tangent.autodiff(test)
# Add a third parameter for the gradient seed, which matches the shape of f's result.
xxx(np.random.rand(1, 3), np.random.rand(3, 1), np.ones((1, 1)))

mdanatg avatar Feb 26 '18 17:02 mdanatg

Opened #62 for the slice issue.

mdanatg avatar Feb 26 '18 17:02 mdanatg