executorch Both optimized and portable arithmetic operators have broadcasting errors

🐛 Describe the bug

Subtraction is behaving weirdly when we have to broadcast both x and y.

Repro code snippet:

import torch
from torch.export import Dim
from executorch.exir import to_edge
from executorch.runtime import Runtime

class Model(torch.nn.Module):

    def forward(self, x, y):
        return x - y

dim = Dim("dim", min=1, max=1024)
ep = torch.export.export(
    Model(), (torch.ones(1, 4), torch.ones(4, 1)), dynamic_shapes=({1: dim}, {0: dim})
)
pte = to_edge(ep).to_executorch().buffer

runtime = Runtime.get()
program = runtime.load_program(pte)
method = program.load_method("forward")

# run sub with (1, 2) and (2, 1) first, result should be zeros(2, 2)
method.execute((torch.ones(1, 2), torch.ones(2, 1)))

# run sub with (1, 4) and (4, 1), result should be zeros(4, 4) but actually getting zeros(1, 4)
method.execute((torch.ones(1, 4), torch.ones(4, 1)))
"""
Actual
[tensor([[0., 0., 0., 0.]])]

Expected
[tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]])]
"""

Versions

Latest

cc @manuelcandales

Apr 23 '25 23:04 larryliu0820

cc @manuelcandales

Apr 23 '25 23:04 kimishpatel

cc @swolchok

Apr 23 '25 23:04 iseeyuan

hm. portable sub should be working fine:

https://github.com/pytorch/executorch/blame/main/kernels/portable/cpu/op_sub.cpp#L47

(@larryliu0820 are you sure you're running portable?)

optimized just plain doesn't seem to have a resize except in one specific case, which seems obviously broken:

https://github.com/pytorch/executorch/blame/main/kernels/optimized/cpu/op_sub.cpp#L108

I think I need to learn why https://github.com/pytorch/executorch/blob/main/kernels/test/op_sub_test.cpp#L266 doesn't cover this case.

Apr 24 '25 03:04 swolchok

om/larryliu0820) are you sure you're running portable?)

optimized just plain doesn't seem to have a resize except in one specific case, which seems obviously broken:

https://github.com/pytorch/executorch/blame/main/kernels/optimized/cpu/op_sub.cpp#L108

Are you sure? I do see this https://github.com/pytorch/executorch/blob/3072890824adfb95b02159746a130751d184c950/kernels/optimized/cpu/op_add_sub_impl.h#L90

Apr 24 '25 14:04 kimishpatel

Are you sure? I do see this

OK, two specific cases. that resize is also gated. there are clearly uncovered cases

Apr 24 '25 16:04 swolchok

findings:

the specific regression test I attempted to add for op_sub passes for both portable and optimized.
I was wrong when I said "there are clearly uncovered cases" above. I wasn't reading closely; the uncovered cases funnel into utility functions that do the necessary resize.

best guess is that there is something wrong in the Python or lowering layer in the report here; the kernels seem to be correct.

Apr 24 '25 17:04 swolchok

@larryliu0820 your repro script doesn't run for me. perhaps you are using a different torch version than the pinned one?

(.venv) swolchok@swolchok-mac ~/src/executorch> python repro.py
Traceback (most recent call last):
  File "/Users/swolchok/src/executorch/repro.py", line 11, in <module>
    dim = Dim.AUTO(min=1, max=1024)
TypeError: '_DimHint' object is not callable

over to you to fix repro

Apr 24 '25 17:04 swolchok

@swolchok what is your full error? Is it complaining Dim.AUTO doesn't exist?

Apr 24 '25 17:04 larryliu0820

@larryliu0820 sorry I missed a line while pasting. edited.

Apr 24 '25 17:04 swolchok

@swolchok try

dim = Dim("dim", min=1, max=1024)

instead

Apr 24 '25 17:04 larryliu0820

repros. interestingly, if I comment out the first method.execute call, I get the correct result from the second call.

Apr 24 '25 17:04 swolchok

This is not an operator-specific bug. Also repros with mul/add/div, but not torch.max. I'm probably not the best person to look into this.

Apr 24 '25 17:04 swolchok

repros. interestingly, if I comment out the first method.execute call, I get the correct result from the second call.

Yeah, I'm guessing something in Program or Method is stateful. Unlikely something inside kernel since it can't keep any state

Apr 24 '25 17:04 larryliu0820

@JacobSzwejbka do you know anything in Program/Method that could cause this?

Apr 28 '25 17:04 metascroy

executorch executorch copied to clipboard

Both optimized and portable arithmetic operators have broadcasting errors

🐛 Describe the bug

Versions

executorch
executorch copied to clipboard