Tinytorch Backward pass for matmul gives an error

I was toying around with the tiny xor example and I changed the training loop from the current loop:

for idx in range(ITER):
    pred = model(x)
    loss = tt.mse_loss(pred, y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

to:

for idx in range(ITER):
    loss = Tensor([0.0])
    for x1, y1 in zip(x, y):
        pred = model(x1)
        loss += tt.mse_loss(pred, y1)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(loss.item())

Semantically its pretty much the same code, but on running the backward pass it gives the following error:

Traceback (most recent call last):
  File "/Users/hedwig/Tinytorch/tiny_xor_net.py", line 52, in <module>
    loss.backward()
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 262, in backward
    grads = node._ctx.op.backward(node._ctx, node.grad)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hedwig/Tinytorch/tinytorch.py", line 382, in backward
    grad_y = transpose_last_axis(x.data) @ grad.data
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)

The error, I can hazard a guess is due to not broadcasting the grad and the parent data arrays before the backward pass for the MatMul Function. The shapes obtained in this case (before the matmul fails) is as follows:

Shape of grad.data: 1,)
Shape of x.data.T: (2,)
Shape of y.data.T: (1, 2)

Apr 10 '24 07:04 RS2007

Can you can give me x1.shape, y1.shape and pred.shape I'll debug it tonight

( This detailed nicely raised issue I have seen in while nice one)

Apr 10 '24 07:04 joey00072

Sure, these are the shapes of x1,y1 and pred:

x1 shape: (2,)
y1 shape: (1,)
pred shape: (4, 1)

Apr 10 '24 11:04 RS2007

Probably happens because you are not handling the matmul backwards correctly. In this example, you do (2) @ (2, 1) -> (1) in shapes, in the forward pass in the matmul. In this situation x behaves like a row vector so the op is like doing (1, 2) @ (2, 1) -> (1, 1) then reducing to (1). What you want to do for the x grad is a matmul of shapes x.T @ grad, which should be (according to the actual behaviour of vector @ matrix in numpy) (2, 1) @ (1) -> (2, 1), that is, the shape of y, but since transposing x is (2).T -> (2) (because of numpy semantics I guess) you are trying to do (2) @ (1) instead. Basically, be careful with vector/matrix multplication and how it behaves (since there is no concept of 'row' or 'column' vector in 1D hehe). So the solution is to handle those cases manually or generalize it or something.

Apr 11 '24 22:04 davidgonmar

If you don't want to be expanding arrays and stuff, grad_y = x.outer_product(y) on the vect @ matrix case.

Apr 11 '24 22:04 davidgonmar

Actually @davidgonmar , This makes total sense. Just tried outer_product and that works

Apr 12 '24 03:04 RS2007

Hey, @RS2007 if change is minimal can you raise pr.

Thanks @davidgonmar @RS2007 (sorry i'm kind of busy this fews days)

Apr 12 '24 06:04 joey00072

Yup sure

Apr 12 '24 06:04 RS2007

Tinytorch Tinytorch copied to clipboard

Backward pass for matmul gives an error

Tinytorch
Tinytorch copied to clipboard