TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

TensorRT not supporting ViT?

Open billysx opened this issue 3 years ago • 4 comments

Using the tutorial jupyter notebook https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb with only 2 modifications:

  1. In block 1: resnet50 = models.resnet50(pretrained=True).eval() -> resnet50 = vit_b_16 = timm.create_model('vit_base_patch16_224', pretrained=True).eval()

  2. In block 6: resnet50_gpu = models.resnet50( pretrained=True).to("cuda").eval() -> resnet50_gpu = timm.create_model('vit_base_patch16_224', pretrained=True).to("cuda").eval() Just the two definition of models changed from ResNet50 to ViT-Base-16 The torch->onnx convert is successful, onnx->tensorrt is successful as well: (to make sure every other things are the same, I kept the file name to be resnet50_pytorch)

[07/15/2022-16:33:39] [W] * GPU compute time is unstable, with coefficient of variance = 1.52451%.
[07/15/2022-16:33:39] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[07/15/2022-16:33:39] [I] Explanations of the performance metrics are printed in the verbose logs.
[07/15/2022-16:33:39] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16

However, when it comes inference, the error reports to be:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_4124637/528596916.py in <module>
      1 print("Warming up...")
      2 
----> 3 pred = predict(preprocessed_images)
      4 
      5 print("Done warming up!")

/tmp/ipykernel_4124637/129159949.py in predict(batch)
      3     cuda.memcpy_htod_async(d_input, batch, stream)
      4     # execute model
----> 5     context.execute_async_v2(bindings, stream.handle, None)
      6     # transfer predictions back
      7     cuda.memcpy_dtoh_async(output, d_output, stream)

AttributeError: 'NoneType' object has no attribute 'execute_async_v2'

Stating that the context buit by

runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING)) 
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

is NoneType, which doesn't occur in ResNet50 situation. Also, the saved trt model is not empty with a file size of 175MB. Is it that TensorRT still not support ViT yet?

billysx avatar Jul 15 '22 23:07 billysx

TensorRT does support ViT, as you can see, you successfully convert the onnx to trt engine and you can see the inference latency and throughput in the log of trtexec.

zerollzeng avatar Jul 16 '22 02:07 zerollzeng

So why is ViT not supported by the trt runtime and engine? Can you help me solve this bug?

billysx avatar Jul 16 '22 17:07 billysx

I don't have the env for the notebook, but I try to reproduce it like this:

#trt.py
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

f = open("resnet_engine_pytorch.trt", "rb")
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))

engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

BATCH_SIZE = 1000

USE_FP16 = True
target_dtype = np.float16 if USE_FP16 else np.float32

input_batch = np.empty([32, 224, 224, 3], dtype = target_dtype)
output = np.empty([BATCH_SIZE, 1000], dtype = target_dtype)

# allocate device memory
d_input = cuda.mem_alloc(1 * input_batch.nbytes)
d_output = cuda.mem_alloc(1 * output.nbytes)

bindings = [int(d_input), int(d_output)]

stream = cuda.Stream()

context.execute_async_v2(bindings, stream.handle, None)
stream.synchronize()
print('PASSED')

The above script passed on my env, I would suggest checking why the context is None in your case.

zerollzeng avatar Jul 17 '22 12:07 zerollzeng

HI @kevinch-nv, I just found a problem for the sample,

In https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb

# step out of Python for a moment to convert the ONNX model to a TRT engine using trtexec
if USE_FP16:
    !trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt  --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
else:
    !trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt  --explicitBatch

The above fp16 command builds a FP32 engine with FP16 IO. I think that's not our purpose here, should we add a --fp16 too?

zerollzeng avatar Jul 17 '22 12:07 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio avatar Dec 06 '22 01:12 ttyio