Tinytorch icon indicating copy to clipboard operation
Tinytorch copied to clipboard

Backward pass for matmul gives an error

Open RS2007 opened this issue 1 year ago • 7 comments

I was toying around with the tiny xor example and I changed the training loop from the current loop:

for idx in range(ITER):
    pred = model(x)
    loss = tt.mse_loss(pred, y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

to:

for idx in range(ITER):
    loss = Tensor([0.0])
    for x1, y1 in zip(x, y):
        pred = model(x1)
        loss += tt.mse_loss(pred, y1)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

Semantically its pretty much the same code, but on running the backward pass it gives the following error:

Traceback (most recent call last):
  File "/Users/hedwig/Tinytorch/tiny_xor_net.py", line 52, in <module>
    loss.backward()
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 262, in backward
    grads = node._ctx.op.backward(node._ctx, node.grad)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 382, in backward
    grad_y = transpose_last_axis(x.data) @ grad.data
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)

The error, I can hazard a guess is due to not broadcasting the grad and the parent data arrays before the backward pass for the MatMul Function. The shapes obtained in this case (before the matmul fails) is as follows:

Shape of grad.data: 1,)
Shape of x.data.T: (2,)
Shape of y.data.T: (1, 2)

RS2007 avatar Apr 10 '24 07:04 RS2007

Can you can give me x1.shape, y1.shape and pred.shape I'll debug it tonight

( This detailed nicely raised issue I have seen in while nice one)

joey00072 avatar Apr 10 '24 07:04 joey00072

Sure, these are the shapes of x1,y1 and pred:

x1 shape: (2,)
y1 shape: (1,)
pred shape: (4, 1)

RS2007 avatar Apr 10 '24 11:04 RS2007

Probably happens because you are not handling the matmul backwards correctly. In this example, you do (2) @ (2, 1) -> (1) in shapes, in the forward pass in the matmul. In this situation x behaves like a row vector so the op is like doing (1, 2) @ (2, 1) -> (1, 1) then reducing to (1). What you want to do for the x grad is a matmul of shapes x.T @ grad, which should be (according to the actual behaviour of vector @ matrix in numpy) (2, 1) @ (1) -> (2, 1), that is, the shape of y, but since transposing x is (2).T -> (2) (because of numpy semantics I guess) you are trying to do (2) @ (1) instead. Basically, be careful with vector/matrix multplication and how it behaves (since there is no concept of 'row' or 'column' vector in 1D hehe). So the solution is to handle those cases manually or generalize it or something.

davidgonmar avatar Apr 11 '24 22:04 davidgonmar

If you don't want to be expanding arrays and stuff, grad_y = x.outer_product(y) on the vect @ matrix case.

davidgonmar avatar Apr 11 '24 22:04 davidgonmar

Actually @davidgonmar , This makes total sense. Just tried outer_product and that works

RS2007 avatar Apr 12 '24 03:04 RS2007

Hey, @RS2007 if change is minimal can you raise pr.

Thanks @davidgonmar @RS2007 (sorry i'm kind of busy this fews days)

joey00072 avatar Apr 12 '24 06:04 joey00072

Yup sure

RS2007 avatar Apr 12 '24 06:04 RS2007