Tinytorch
Tinytorch copied to clipboard
Backward pass for matmul gives an error
I was toying around with the tiny xor example and I changed the training loop from the current loop:
for idx in range(ITER):
pred = model(x)
loss = tt.mse_loss(pred, y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
print(loss.item())
to:
for idx in range(ITER):
loss = Tensor([0.0])
for x1, y1 in zip(x, y):
pred = model(x1)
loss += tt.mse_loss(pred, y1)
loss.backward()
optimizer.step()
optimizer.zero_grad()
print(loss.item())
Semantically its pretty much the same code, but on running the backward pass it gives the following error:
Traceback (most recent call last):
File "/Users/hedwig/Tinytorch/tiny_xor_net.py", line 52, in <module>
loss.backward()
File "/Users/hedwig/Tinytorch/tinytorch.py", line 262, in backward
grads = node._ctx.op.backward(node._ctx, node.grad)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/hedwig/Tinytorch/tinytorch.py", line 382, in backward
grad_y = transpose_last_axis(x.data) @ grad.data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)
The error, I can hazard a guess is due to not broadcasting the grad and the parent data arrays before the backward pass for the MatMul Function. The shapes obtained in this case (before the matmul fails) is as follows:
Shape of grad.data: 1,)
Shape of x.data.T: (2,)
Shape of y.data.T: (1, 2)
Can you can give me x1.shape, y1.shape and pred.shape I'll debug it tonight
( This detailed nicely raised issue I have seen in while nice one)
Sure, these are the shapes of x1,y1 and pred:
x1 shape: (2,)
y1 shape: (1,)
pred shape: (4, 1)
Probably happens because you are not handling the matmul backwards correctly. In this example, you do (2) @ (2, 1) -> (1) in shapes, in the forward pass in the matmul. In this situation x behaves like a row vector so the op is like doing (1, 2) @ (2, 1) -> (1, 1) then reducing to (1). What you want to do for the x grad is a matmul of shapes x.T @ grad, which should be (according to the actual behaviour of vector @ matrix in numpy) (2, 1) @ (1) -> (2, 1), that is, the shape of y, but since transposing x is (2).T -> (2) (because of numpy semantics I guess) you are trying to do (2) @ (1) instead. Basically, be careful with vector/matrix multplication and how it behaves (since there is no concept of 'row' or 'column' vector in 1D hehe). So the solution is to handle those cases manually or generalize it or something.
If you don't want to be expanding arrays and stuff, grad_y = x.outer_product(y) on the vect @ matrix case.
Actually @davidgonmar , This makes total sense. Just tried outer_product and that works
Hey, @RS2007 if change is minimal can you raise pr.
Thanks @davidgonmar @RS2007 (sorry i'm kind of busy this fews days)
Yup sure