TensorRT
TensorRT copied to clipboard
TensorRT not supporting ViT?
Using the tutorial jupyter notebook https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb with only 2 modifications:
-
In block 1:
resnet50 = models.resnet50(pretrained=True).eval()->resnet50 = vit_b_16 = timm.create_model('vit_base_patch16_224', pretrained=True).eval() -
In block 6:
resnet50_gpu = models.resnet50( pretrained=True).to("cuda").eval()->resnet50_gpu = timm.create_model('vit_base_patch16_224', pretrained=True).to("cuda").eval()Just the two definition of models changed from ResNet50 to ViT-Base-16 The torch->onnx convert is successful, onnx->tensorrt is successful as well: (to make sure every other things are the same, I kept the file name to be resnet50_pytorch)
[07/15/2022-16:33:39] [W] * GPU compute time is unstable, with coefficient of variance = 1.52451%.
[07/15/2022-16:33:39] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[07/15/2022-16:33:39] [I] Explanations of the performance metrics are printed in the verbose logs.
[07/15/2022-16:33:39] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
However, when it comes inference, the error reports to be:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_4124637/528596916.py in <module>
1 print("Warming up...")
2
----> 3 pred = predict(preprocessed_images)
4
5 print("Done warming up!")
/tmp/ipykernel_4124637/129159949.py in predict(batch)
3 cuda.memcpy_htod_async(d_input, batch, stream)
4 # execute model
----> 5 context.execute_async_v2(bindings, stream.handle, None)
6 # transfer predictions back
7 cuda.memcpy_dtoh_async(output, d_output, stream)
AttributeError: 'NoneType' object has no attribute 'execute_async_v2'
Stating that the context buit by
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
is NoneType, which doesn't occur in ResNet50 situation. Also, the saved trt model is not empty with a file size of 175MB. Is it that TensorRT still not support ViT yet?
TensorRT does support ViT, as you can see, you successfully convert the onnx to trt engine and you can see the inference latency and throughput in the log of trtexec.
So why is ViT not supported by the trt runtime and engine? Can you help me solve this bug?
I don't have the env for the notebook, but I try to reproduce it like this:
#trt.py
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
f = open("resnet_engine_pytorch.trt", "rb")
runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING))
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
BATCH_SIZE = 1000
USE_FP16 = True
target_dtype = np.float16 if USE_FP16 else np.float32
input_batch = np.empty([32, 224, 224, 3], dtype = target_dtype)
output = np.empty([BATCH_SIZE, 1000], dtype = target_dtype)
# allocate device memory
d_input = cuda.mem_alloc(1 * input_batch.nbytes)
d_output = cuda.mem_alloc(1 * output.nbytes)
bindings = [int(d_input), int(d_output)]
stream = cuda.Stream()
context.execute_async_v2(bindings, stream.handle, None)
stream.synchronize()
print('PASSED')
The above script passed on my env, I would suggest checking why the context is None in your case.
HI @kevinch-nv, I just found a problem for the sample,
In https://github.com/NVIDIA/TensorRT/blob/main/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb
# step out of Python for a moment to convert the ONNX model to a TRT engine using trtexec
if USE_FP16:
!trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
else:
!trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt --explicitBatch
The above fp16 command builds a FP32 engine with FP16 IO. I think that's not our purpose here, should we add a --fp16 too?
closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!