executorch icon indicating copy to clipboard operation
executorch copied to clipboard

Both optimized and portable arithmetic operators have broadcasting errors

Open larryliu0820 opened this issue 7 months ago • 14 comments

🐛 Describe the bug

Subtraction is behaving weirdly when we have to broadcast both x and y.

Repro code snippet:

import torch
from torch.export import Dim
from executorch.exir import to_edge
from executorch.runtime import Runtime

class Model(torch.nn.Module):

    def forward(self, x, y):
        return x - y

dim = Dim("dim", min=1, max=1024)
ep = torch.export.export(
    Model(), (torch.ones(1, 4), torch.ones(4, 1)), dynamic_shapes=({1: dim}, {0: dim})
)
pte = to_edge(ep).to_executorch().buffer

runtime = Runtime.get()
program = runtime.load_program(pte)
method = program.load_method("forward")

# run sub with (1, 2) and (2, 1) first, result should be zeros(2, 2)
method.execute((torch.ones(1, 2), torch.ones(2, 1)))

# run sub with (1, 4) and (4, 1), result should be zeros(4, 4) but actually getting zeros(1, 4)
method.execute((torch.ones(1, 4), torch.ones(4, 1)))
"""
Actual
[tensor([[0., 0., 0., 0.]])]

Expected
[tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]])]
"""

Versions

Latest

cc @manuelcandales

larryliu0820 avatar Apr 23 '25 23:04 larryliu0820

cc @manuelcandales

kimishpatel avatar Apr 23 '25 23:04 kimishpatel

cc @swolchok

iseeyuan avatar Apr 23 '25 23:04 iseeyuan

hm. portable sub should be working fine:

https://github.com/pytorch/executorch/blame/main/kernels/portable/cpu/op_sub.cpp#L47

(@larryliu0820 are you sure you're running portable?)

optimized just plain doesn't seem to have a resize except in one specific case, which seems obviously broken:

https://github.com/pytorch/executorch/blame/main/kernels/optimized/cpu/op_sub.cpp#L108

I think I need to learn why https://github.com/pytorch/executorch/blob/main/kernels/test/op_sub_test.cpp#L266 doesn't cover this case.

swolchok avatar Apr 24 '25 03:04 swolchok

om/larryliu0820) are you sure you're running portable?)

optimized just plain doesn't seem to have a resize except in one specific case, which seems obviously broken:

https://github.com/pytorch/executorch/blame/main/kernels/optimized/cpu/op_sub.cpp#L108

Are you sure? I do see this https://github.com/pytorch/executorch/blob/3072890824adfb95b02159746a130751d184c950/kernels/optimized/cpu/op_add_sub_impl.h#L90

kimishpatel avatar Apr 24 '25 14:04 kimishpatel

Are you sure? I do see this

OK, two specific cases. that resize is also gated. there are clearly uncovered cases

swolchok avatar Apr 24 '25 16:04 swolchok

findings:

  • the specific regression test I attempted to add for op_sub passes for both portable and optimized.
  • I was wrong when I said "there are clearly uncovered cases" above. I wasn't reading closely; the uncovered cases funnel into utility functions that do the necessary resize.

best guess is that there is something wrong in the Python or lowering layer in the report here; the kernels seem to be correct.

swolchok avatar Apr 24 '25 17:04 swolchok

@larryliu0820 your repro script doesn't run for me. perhaps you are using a different torch version than the pinned one?

(.venv) swolchok@swolchok-mac ~/src/executorch> python repro.py
Traceback (most recent call last):
  File "/Users/swolchok/src/executorch/repro.py", line 11, in <module>
    dim = Dim.AUTO(min=1, max=1024)
TypeError: '_DimHint' object is not callable

over to you to fix repro

swolchok avatar Apr 24 '25 17:04 swolchok

@swolchok what is your full error? Is it complaining Dim.AUTO doesn't exist?

larryliu0820 avatar Apr 24 '25 17:04 larryliu0820

@larryliu0820 sorry I missed a line while pasting. edited.

swolchok avatar Apr 24 '25 17:04 swolchok

@swolchok try

dim = Dim("dim", min=1, max=1024)

instead

larryliu0820 avatar Apr 24 '25 17:04 larryliu0820

repros. interestingly, if I comment out the first method.execute call, I get the correct result from the second call.

swolchok avatar Apr 24 '25 17:04 swolchok

This is not an operator-specific bug. Also repros with mul/add/div, but not torch.max. I'm probably not the best person to look into this.

swolchok avatar Apr 24 '25 17:04 swolchok

repros. interestingly, if I comment out the first method.execute call, I get the correct result from the second call.

Yeah, I'm guessing something in Program or Method is stateful. Unlikely something inside kernel since it can't keep any state

larryliu0820 avatar Apr 24 '25 17:04 larryliu0820

@JacobSzwejbka do you know anything in Program/Method that could cause this?

metascroy avatar Apr 28 '25 17:04 metascroy