torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

CeilFloatModule_basic CI failure after latest nightly pytorch updates

Open cathyzhyi opened this issue 3 years ago • 9 comments

This test CeilFloatModule_basic failed with error

ERROR: expected a value of type `torch.Tensor` but got `int`. 

The baseline believes the result should a Tensor but the last 2 operations are:

    %4 = torch.aten.ceil.float %3 : !torch.float -> !torch.int loc(#loc3)
    return %4 : !torch.int loc(#loc0)

Didn't get to the bottom of the problem. MLIR dump and torchscript graph dump

cathyzhyi avatar May 07 '22 23:05 cathyzhyi

Hi Yi, can we pin the CI to a known-good version of PyTorch while investigating this if there isn't a quick fix?

silvasean avatar May 09 '22 12:05 silvasean

@silvasean I submitted a workaround in https://github.com/llvm/torch-mlir/pull/843

cathyzhyi avatar May 09 '22 14:05 cathyzhyi

It's probably this same scalar promotion bug that just got patched in upstream pytorch https://github.com/pytorch/pytorch/issues/74400#event-6574874903

makslevental avatar May 09 '22 18:05 makslevental

@makslevental -- is the new behavior the "expected" behavior then?

silvasean avatar May 10 '22 08:05 silvasean

@silvasean I would assume so - that bug was about python scalars getting promoted to tensors (and then having any record of that promotion erased at the python level during dispatch). I didn't closely inspect the fix but I imagine that yes this is the expected behavior (python scalars go in, python scalars come out).

makslevental avatar May 10 '22 15:05 makslevental

python scalars go in, python scalars come out

So torch.ops.aten.ceil(1.5) should expect scalar output rather than Tensor right?

cathyzhyi avatar May 10 '22 21:05 cathyzhyi

python scalars go in, python scalars come out

So torch.ops.aten.ceil(1.5) should expect scalar output rather than Tensor right?

@cathyzhyi this is the PR that fixed the big I linked to earlier. Like I said I haven't dug deep on it, but high-level that's the behaviour I would expect.

makslevental avatar May 10 '22 21:05 makslevental

I see. Thanks for confirming. Seems torch.ops.aten.ceil(1.5) still returns a Tensor even with that fix. Let me open another issue upstream.

cathyzhyi avatar May 10 '22 22:05 cathyzhyi

I opened an issue https://github.com/pytorch/pytorch/issues/77223 upstream.

cathyzhyi avatar May 11 '22 02:05 cathyzhyi

It is working again.

silvasean avatar Oct 07 '22 14:10 silvasean