TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] Torch-TensorRT doesn't support timm nfnet

Open xuzhao9 opened this issue 3 years ago • 6 comments

Bug Description

Program crashes when running the following benchmarking script:

import torch
import torch_tensorrt
import timm
import time
import numpy as np
import torch.backends.cudnn as cudnn

torch.hub._validate_not_a_forked_repo=lambda a,b,c: True

nfnet = timm.create_model('dm_nfnet_f0',pretrained=True)

model = nfnet.eval().to("cuda")
detections_batch = model(torch.randn(128, 3, 224, 224).to("cuda"))
detections_batch.shape
cudnn.benchmark = True

def benchmark(model, input_shape=(1024, 3, 512, 512), dtype='fp32', nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype=='fp16':
        input_data = input_data.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            pred_loc  = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%10==0:
                print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print('Average throughput: %.2f images/second'%(input_shape[0]/np.mean(timings)))

trt_model = torch_tensorrt.compile(model,
    inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
    enabled_precisions= { torch_tensorrt.dtype.half} # Run with FP16
)
benchmark(trt_model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp16")

To Reproduce

Steps to reproduce the behavior:

  1. Run the script
  2. Error message:
WARNING: [Torch-TensorRT] - Cannot infer input type from calcuations in graph for input x.1. Assuming it is Float32. If not, specify input type explicity
ERROR: [Torch-TensorRT] - Unsupported operator: aten::ceil.float(float a) -> (int)
  File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/timm/models/layers/padding.py", line 19
def get_same_padding(x: int, k: int, s: int, d: int):
    return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
                ~~~~~~~~~ <--- HERE

ERROR: [Torch-TensorRT] - Unsupported operator: aten::ceil.float(float a) -> (int)
  File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/timm/models/layers/padding.py", line 19
def get_same_padding(x: int, k: int, s: int, d: int):
    return max((math.ceil(x / s) - 1) * s + (k - 1) * d + 1 - x, 0)
                ~~~~~~~~~ <--- HERE

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
Segmentation fault (core dumped)

Expected behavior

Shouldn't crash and should print performance results

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): master (4fd886d08ce77323995b5bf6a21a0d0e8dde8d42)
  • PyTorch Version (e.g. 1.0): 1.10.0+cu113
  • CPU Architecture: AWS p3d.24xlarge instance
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source): python setup.py bdist_wheel
  • Are you using local sources or building from archives: local sources from github
  • Python version: 3.8
  • CUDA version: 11.3
  • GPU models and configuration: Nvidia V100
  • Any other relevant information:

xuzhao9 avatar Feb 03 '22 00:02 xuzhao9

Hey, we have run in the same issue a while back, timm is not very well implemented for inference for torch_tensorrt. However for this particular issue you can easily make it work by recording the padding values for your images and setting them manually in a monkey patch.

MatthieuToulemont avatar Feb 04 '22 10:02 MatthieuToulemont

Hey, we have run in the same issue a while back, timm is not very well implemented for inference for torch_tensorrt. However for this particular issue you can easily make it work by recording the padding values for your images and setting them manually in a monkey patch.

Sorry I am new to torch_tensorrt, can you give an example on how to patch the example script in the issue body?

xuzhao9 avatar Feb 04 '22 14:02 xuzhao9

Probably a relevant issue, I encounter another error when trying to run Torch-TensorRT with torchvision models: https://github.com/pytorch/vision/issues/5378. Since Torch-TensorRT only builds against the latest stable PyTorch release, I don't test it on the nightly version.

xuzhao9 avatar Feb 04 '22 17:02 xuzhao9

So, here the problem seems to be that the function aten::ceil.float is not supported by Torch-TensorRT so you want to find a way to work around that.

An easy solution is to install timm in an NGC container. Using pip it will be installed in /opt/lib/python3.8/site-packages/timm .

You want to modify the function, get_same_padding from /opt/lib/python3.8/site-packages/timm/models/layers/padding.py so that it does not use aten::ceil.float .

The quickest way to do that is to replace it by a dict and then modify the functions you need that call get_same_padding to use the dict you have just created.

MatthieuToulemont avatar Feb 04 '22 17:02 MatthieuToulemont

Thanks @MatthieuTPHR! We are developing a benchmark suite that compares different PyTorch TensorRT libraries (such as onnx2trt, torch_tensorrt, torch2trt, etc), and timm is one of our upstream model repository. We would like not to change the model code unless the patch is accepted by the upstream repo (in this case, it is timm).

Is there a plan when Torch-TensorRT will support aten::ceil.float?

A related issue is https://github.com/NVIDIA/Torch-TensorRT/issues/890, where we also find correctness issues with timm and Torch-TensorRT.

xuzhao9 avatar Feb 23 '22 21:02 xuzhao9

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Aug 17 '22 00:08 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Nov 21 '22 00:11 github-actions[bot]

[Removed]

Christina-Young-NVIDIA avatar Dec 20 '22 02:12 Christina-Young-NVIDIA

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Apr 04 '23 00:04 github-actions[bot]