tvm [Bug][Frontend][PyTorch] Bug in the aten::fill_() PyTorch operator implementation?

Potentially, there is an issue with implementation of PyTorch aten::fill_() operator in the PyTorch frontend.

I have found this bug when I was trying to run gallery/how_to/deploy_models/deploy_object_detection_pytorch.py example with Torch 1.12.0 (see my post in Discuss Forum: [frontend][pytorch] TVM compatibility with Torch 1.12.0). In this example, Mask R-CNN model is used. After some debugging, I have found out that execution was failing when parsing these two lines:

 /opt/homebrew/Caskroom/miniforge/base/envs/tvm_test/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:121:0 
   %2004 : Long(requires_grad=0, device=cpu) = aten::div(%1999, %other.1, %43), scope:  
 __module.model/__module.model.rpn/__module.model.rpn.anchor_generator #  
 /opt/homebrew/Caskroom/miniforge/base/envs/tvm_test/lib/python3.8/site-packages/torch/_tensor.py:674:0 
   %stride_height.1 : Long(requires_grad=0, device=cpu) = aten::fill_(%2003, %2004), scope:  
 __module.model/__module.model.rpn/__module.model.rpn.anchor_generator #

This is why I have prepared a simple unit test example that highlight this problem. The problem is present when one uses Torch 1.11.0 as well.

Expected behavior

Unit tests to run successfully. Pytorch aten::fill_() operator consumes 0-size value tensor in all three tests. Pytorch and TVM output of this operator has to be identical.

Actual behavior

The first two tests (tests are listed in "Steps to reproduce") works fine. However, the third tests fail with an error:

=================================== FAILURES ===================================
______________________________ test_fill_with_div ______________________________
tests/python/frontend/pytorch/test_forward.py:276: in test_fill_with_div
    verify_model_with_input(test_func, [torch.rand([1, 3, 10, 10]).float()])
tests/python/frontend/pytorch/test_forward.py:242: in verify_model_with_input
    mod, params = relay.frontend.from_pytorch(trace, input_shapes, custom_convert_map)
python/tvm/relay/frontend/pytorch.py:4564: in from_pytorch
    outputs = converter.convert_operators(_get_operator_nodes(graph.nodes()), outputs, ret_name)
python/tvm/relay/frontend/pytorch.py:3938: in convert_operators
    relay_out = relay_op(
python/tvm/relay/frontend/pytorch.py:812: in fill_
    return self.full_impl(self.infer_shape(data), fill_value, input_types[0])
python/tvm/relay/frontend/pytorch.py:679: in full_impl
    out = _op.full(_expr.const(fill_value, dtype=dtype), size, dtype=dtype)
python/tvm/relay/expr.py:517: in const
    raise ValueError("value has to be scalar or NDArray")
E   ValueError: value has to be scalar or NDArray
----------------------------- Captured stderr call -----------------------------

Environment

TVM version: v0.10.dev (c83ee08c102f3) Architecture: arm64 System Version: macOS 12.5.1 (21G83) Kernel Version: Darwin 21.6.0

Steps to reproduce

Add tests listed below to the tests/python/frontend/pytorch/test_forward.py script and run it.

# works fine
def test_fill():
    def test_func(x):
        return x.fill_(3)

    verify_model_with_input(test_func, [torch.rand([1, 3, 10, 10]).float()])

# works fine
def test_fill_zero_dim_value_tensor():
    def test_func(x):
        return x.fill_(torch.tensor(3))

    verify_model_with_input(test_func, [torch.rand([1, 3, 10, 10]).float()])

# FAILURE
def test_fill_with_div():
    def test_func(x):
        y = torch.div(torch.tensor(6.), torch.tensor(2.))
        return x.fill_(y)

    verify_model_with_input(test_func, [torch.rand([1, 3, 10, 10]).float()])

Sep 20 '22 09:09 oviazlo

We just need to add constant folding on the fill value. See this commit https://github.com/masahi/tvm/commit/3e88280ec3a0b943d8aac76a7a99f75ffd0ac863.

Can you make a PR with your test case?

Sep 21 '22 08:09 masahi

Thank you a lot, @masahi. I have created PR with my test case: https://github.com/apache/tvm/pull/12857

Sep 21 '22 10:09 oviazlo

tvm tvm copied to clipboard

[Bug][Frontend][PyTorch] Bug in the aten::fill_() PyTorch operator implementation?

Expected behavior

Actual behavior

Environment

Steps to reproduce

tvm
tvm copied to clipboard