TensorRT 🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int

Bug Description

torch.ops.aten.remainder.Scalar seems to return fmod result when input number is big

To Reproduce

save it and run the script below

import torch
import torch.nn as nn

a = torch.tensor([[5950571286963681280]]).cuda()
example_args = (a,)


class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()

    def forward(self, x):
        return torch.remainder(x, 196613)


model = ToyModel().eval().cuda()

with torch.no_grad():
    ep = torch.export.export(model, args=example_args)

from torch_tensorrt.dynamo._compiler import compile as dynamo_compile
from torch_tensorrt import logging as ts_logging

with ts_logging.debug():
    compiled = dynamo_compile(
        exported_program=ep,
        disable_tf32=True,
        inputs=example_args,
        min_block_size=1,
        debug=True,
    )

with torch.no_grad():
    print(compiled(*example_args))

Expected behavior

expected to return result like

tensor([[75722]], device='cuda:0')

however, the printed result is

tensor([[-120891]], device='cuda:0')

my full execution log is remainder_error.log

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 10.1.0
PyTorch Version (e.g. 1.0): 2.4.1+cu124
CPU Architecture: x86_64
OS (e.g., Linux): linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version: 3.11.9
CUDA version: 12.6
GPU models and configuration: nvidia L4
Any other relevant information:

Additional context

Oct 12 '24 01:10 sean-xiang-applovin

BTW,

the converted version of torch.ops.aten.remainder.Scalar seems not even as fast as original ops.
it seems torch.ops.aten.remainder.Scalar works with int that is not that big. Not sure if this is caused by int64

Oct 12 '24 01:10 sean-xiang-applovin

Thanks for pointing this out. I looked into this a bit. TRT does not support fmod operation directly. So in torchTRT we implement it as fmod(fmod(dividend, divisor) + divisor) and fmod in turn is sub(dividend, prod(trunc_div(dividend, divisor), divisor))

Generally dividend > prod(trunc_div(dividend, divisor), divisor)

But in large integers trunc_div(dividend, divisor) in this case results in 30265401409536 (should be 30,265,401,000,766) which results in prod(trunc_div(dividend, divisor), divisor) > dividend and results in the negative number. As you said, 5950571286963681280 falls in the signed int64 range, so I am not sure why TRT is returning reduced precision. I can get it clarified more from TRT team. It must be loss of accuracy in computation. Please note that float32 would also lead in accuracy loss.

Oct 22 '24 01:10 apbose

Thanks @apbose for your help. I have tried to export this graph to onnx and compile it with trtexec, it seems the same issue. The result I get by this way is -80369420288,

I have attached my exported onnx in this scalar.zip

What is the suggested way to deal with these big numbers, do you have any suggestions?

Oct 24 '24 17:10 sean-xiang-applovin

TensorRT TensorRT copied to clipboard

🐛 [Bug] torch.ops.aten.remainder.Scalar seems not working with big int

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

TensorRT
TensorRT copied to clipboard