xla icon indicating copy to clipboard operation
xla copied to clipboard

`F.embedding_bag(..., mode='max')` yields different results than PyTorch eager.

Open ysiraichi opened this issue 7 months ago • 2 comments

🐛 Bug

Running the script bellow yields in an AssertionError. This issue occurs only when requires_grad=False, which triggers the execution of _embedding_bag_forward_only.

EMB = 10
DIM = 5
N = 5

def fn(x, w, o):
    return torch.nn.functional.embedding_bag(x, w, o, mode="max")

x = torch.randint(0, EMB, (N,), dtype=torch.long)
w = torch.randn((EMB, DIM), requires_grad=False)
o = torch.tensor([0, 3], dtype=torch.long)

out = fn(x, w, o)
Xout = fn(x.to(xm.xla_device()), w.to(xm.xla_device()), o.to(xm.xla_device()))

assert torch.allclose(out, Xout.cpu()), f"{out=} not close to {Xout=}"
Traceback (most recent call last):
  File "examples/scratch.py", line 179, in <module>
    assert torch.allclose(out, Xout.cpu()), f"{out=} not close to {Xout=}"
AssertionError: out=tensor([[ 0.6277,  1.6069,  0.1294,  0.0666,  1.4192],
        [ 0.6289,  0.0599,  0.4328,  0.9031, -0.6462]]) not close to Xout=tensor([[0.6277, 1.6069, 0.1294, 0.0666, 1.4192],
        [0.6289, 0.0599, 0.4328, 0.9031, 0.0000]], device='xla:0')

Expected behavior

Results should be close.

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: CUDA
  • torch_xla version: 7d41035b89fe6b6f6cfb13679ffc256429efa7b2

Additional context

Upon further inspection, it feels like the issue has something to do with negative numbers. Wherever there was supposed to be a negative value in the output, I see it getting truncated to 0.

cc @miladm @JackCaoG @bhavya01

ysiraichi avatar Jun 28 '24 19:06 ysiraichi