TensorRT 🐛 [Bug] Torch-TensorRT doesn't support timm nfnet

Bug Description

Program crashes when running the following benchmarking script:

import torch
import torch_tensorrt
import timm
import time
import numpy as np
import torch.backends.cudnn as cudnn

torch.hub._validate_not_a_forked_repo=lambda a,b,c: True

nfnet = timm.create_model('dm_nfnet_f0',pretrained=True)

model = nfnet.eval().to("cuda")
detections_batch = model(torch.randn(128, 3, 224, 224).to("cuda"))
detections_batch.shape
cudnn.benchmark = True

def benchmark(model, input_shape=(1024, 3, 512, 512), dtype='fp32', nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype=='fp16':
        input_data = input_data.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            pred_loc  = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%10==0:
                print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print('Average throughput: %.2f images/second'%(input_shape[0]/np.mean(timings)))

trt_model = torch_tensorrt.compile(model,
    inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
    enabled_precisions= { torch_tensorrt.dtype.half} # Run with FP16
)
benchmark(trt_model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp16")

To Reproduce

Steps to reproduce the behavior:

Run the script
Error message:

WARNING: [Torch-TensorRT] - Cannot infer input type from calcuations in graph for input x.1. Assuming it is Float32. If not, specify input type explicity
ERROR: [Torch-TensorRT] - Unsupported operator: aten::ceil.float(float a) -> (int)
  File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/timm/models/layers/padding.py", line 19
def get_same_padding(x: int, k: int, s: int, d: int):
    return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
                ~~~~~~~~~ <--- HERE

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ceil.float(float a) -> (int)
  File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/timm/models/layers/padding.py", line 19
def get_same_padding(x: int, k: int, s: int, d: int):
    return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
                ~~~~~~~~~ <--- HERE

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
Segmentation fault (core dumped)

Expected behavior

Shouldn't crash and should print performance results

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): master (4fd886d08ce77323995b5bf6a21a0d0e8dde8d42)
PyTorch Version (e.g. 1.0): 1.10.0+cu113
CPU Architecture: AWS p3d.24xlarge instance
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source): python setup.py bdist_wheel
Are you using local sources or building from archives: local sources from github
Python version: 3.8
CUDA version: 11.3
GPU models and configuration: Nvidia V100
Any other relevant information:

Feb 03 '22 00:02 xuzhao9

Hey, we have run in the same issue a while back, timm is not very well implemented for inference for torch_tensorrt. However for this particular issue you can easily make it work by recording the padding values for your images and setting them manually in a monkey patch.

Feb 04 '22 10:02 MatthieuToulemont

Hey, we have run in the same issue a while back, timm is not very well implemented for inference for torch_tensorrt. However for this particular issue you can easily make it work by recording the padding values for your images and setting them manually in a monkey patch.

Sorry I am new to torch_tensorrt, can you give an example on how to patch the example script in the issue body?

Feb 04 '22 14:02 xuzhao9

Probably a relevant issue, I encounter another error when trying to run Torch-TensorRT with torchvision models: https://github.com/pytorch/vision/issues/5378. Since Torch-TensorRT only builds against the latest stable PyTorch release, I don't test it on the nightly version.

Feb 04 '22 17:02 xuzhao9

So, here the problem seems to be that the function aten::ceil.float is not supported by Torch-TensorRT so you want to find a way to work around that.

An easy solution is to install timm in an NGC container. Using pip it will be installed in /opt/lib/python3.8/site-packages/timm .

You want to modify the function, get_same_padding from /opt/lib/python3.8/site-packages/timm/models/layers/padding.py so that it does not use aten::ceil.float .

The quickest way to do that is to replace it by a dict and then modify the functions you need that call get_same_padding to use the dict you have just created.

Feb 04 '22 17:02 MatthieuToulemont

Thanks @MatthieuTPHR! We are developing a benchmark suite that compares different PyTorch TensorRT libraries (such as onnx2trt, torch_tensorrt, torch2trt, etc), and timm is one of our upstream model repository. We would like not to change the model code unless the patch is accepted by the upstream repo (in this case, it is timm).

Is there a plan when Torch-TensorRT will support aten::ceil.float?

A related issue is https://github.com/NVIDIA/Torch-TensorRT/issues/890, where we also find correctness issues with timm and Torch-TensorRT.

Feb 23 '22 21:02 xuzhao9

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Aug 17 '22 00:08 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Nov 21 '22 00:11 github-actions[bot]

[Removed]

Dec 20 '22 02:12 Christina-Young-NVIDIA

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Apr 04 '23 00:04 github-actions[bot]

TensorRT TensorRT copied to clipboard

🐛 [Bug] Torch-TensorRT doesn't support timm nfnet

Bug Description

To Reproduce

Expected behavior

Environment

TensorRT
TensorRT copied to clipboard