TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

QAT onnx model convert to trt engine failed

Open JHshen0124 opened this issue 3 years ago • 6 comments

Failed when convert ResNet50 QAT onnx model to trt using trtexec.

Log: [08/12/2022-03:53:50] [V] [TRT] =============== Computing costs for [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(57600,900,30,1) -> Int8(57600,900,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(14400,900:4,30,1) -> Int8(14400,900:4,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaDepthwiseConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (FusedConvActConvolution) [08/12/2022-03:53:50] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(14400,900:4,30,1) -> Int8(1800,900:32,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(1800,900:32,30,1) -> Int8(1800,900:32,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaGroupConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaGroupConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaDepthwiseConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (FusedConvActConvolution) [08/12/2022-03:53:50] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] Deleting timing cache: 81 entries, served 345 hits since creation. [08/12/2022-03:53:50] [E] Error[10]: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22.) [08/12/2022-03:53:50] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

Command: ./trtexec --onnx=/xhzy/int8_test/best_coral_age.onnx --verbose --noDataTransfers --separateProfileRun --dumpProfile --useCudaGraph --int8

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: GeForce RTX 3090 NVIDIA Driver Version: 460.91.03 CUDA Version: 10.2 CUDNN Version: 8.4.0 Operating System: Ubuntu18.04 Python Version (if applicable): 3.6.9 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.9.0 Baremetal or Container (if so, version):

JHshen0124 avatar Aug 12 '22 05:08 JHshen0124

Can you share the onnx model here?

zerollzeng avatar Aug 12 '22 08:08 zerollzeng

Can you share the onnx model here? https://drive.google.com/file/d/1MkE-0DFlcsajfJVY8iXKq-O_qGn3m9ah/view?usp=sharing

JHshen0124 avatar Aug 15 '22 02:08 JHshen0124

I can not reproduce this on my RTX 8000, TRT 8.4.1.5 with the official TRT docker image: nvcr.io/nvidia/tensorrt:22.07-py3, looks like your driver is pretty old, can you try upgrading your driver first? Or try using docker to check whether this can be reproduced.

[08/15/2022-12:13:10] [I] === Performance summary ===
[08/15/2022-12:13:10] [I] Throughput: 1761.79 qps
[08/15/2022-12:13:10] [I] Latency: min = 0.584717 ms, max = 0.734772 ms, mean = 0.589466 ms, median = 0.588562 ms, percentile(99%) = 0.598145 ms
[08/15/2022-12:13:10] [I] Enqueue Time: min = 0.166748 ms, max = 0.460693 ms, mean = 0.205419 ms, median = 0.200317 ms, percentile(99%) = 0.326721 ms
[08/15/2022-12:13:10] [I] H2D Latency: min = 0.0185547 ms, max = 0.0383301 ms, mean = 0.0200844 ms, median = 0.0198975 ms, percentile(99%) = 0.0230713 ms
[08/15/2022-12:13:10] [I] GPU Compute Time: min = 0.561035 ms, max = 0.70813 ms, mean = 0.564955 ms, median = 0.564209 ms, percentile(99%) = 0.572083 ms
[08/15/2022-12:13:10] [I] D2H Latency: min = 0.00366211 ms, max = 0.0151367 ms, mean = 0.00442618 ms, median = 0.00415039 ms, percentile(99%) = 0.00952148 ms
[08/15/2022-12:13:10] [I] Total Host Walltime: 3.00206 s
[08/15/2022-12:13:10] [I] Total GPU Compute Time: 2.98805 s
[08/15/2022-12:13:10] [W] * GPU compute time is unstable, with coefficient of variance = 1.46271%.
[08/15/2022-12:13:10] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/15/2022-12:13:10] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/15/2022-12:13:10] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=my_model.onnx --int8 --fp16

zerollzeng avatar Aug 15 '22 12:08 zerollzeng

I can not reproduce this on my RTX 8000, TRT 8.4.1.5 with the official TRT docker image: nvcr.io/nvidia/tensorrt:22.07-py3, looks like your driver is pretty old, can you try upgrading your driver first? Or try using docker to check whether this can be reproduced.

[08/15/2022-12:13:10] [I] === Performance summary ===
[08/15/2022-12:13:10] [I] Throughput: 1761.79 qps
[08/15/2022-12:13:10] [I] Latency: min = 0.584717 ms, max = 0.734772 ms, mean = 0.589466 ms, median = 0.588562 ms, percentile(99%) = 0.598145 ms
[08/15/2022-12:13:10] [I] Enqueue Time: min = 0.166748 ms, max = 0.460693 ms, mean = 0.205419 ms, median = 0.200317 ms, percentile(99%) = 0.326721 ms
[08/15/2022-12:13:10] [I] H2D Latency: min = 0.0185547 ms, max = 0.0383301 ms, mean = 0.0200844 ms, median = 0.0198975 ms, percentile(99%) = 0.0230713 ms
[08/15/2022-12:13:10] [I] GPU Compute Time: min = 0.561035 ms, max = 0.70813 ms, mean = 0.564955 ms, median = 0.564209 ms, percentile(99%) = 0.572083 ms
[08/15/2022-12:13:10] [I] D2H Latency: min = 0.00366211 ms, max = 0.0151367 ms, mean = 0.00442618 ms, median = 0.00415039 ms, percentile(99%) = 0.00952148 ms
[08/15/2022-12:13:10] [I] Total Host Walltime: 3.00206 s
[08/15/2022-12:13:10] [I] Total GPU Compute Time: 2.98805 s
[08/15/2022-12:13:10] [W] * GPU compute time is unstable, with coefficient of variance = 1.46271%.
[08/15/2022-12:13:10] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/15/2022-12:13:10] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/15/2022-12:13:10] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=my_model.onnx --int8 --fp16

I used your image to try and it works. Thanks. My image is based on nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04 ,maybe the problem is cuda version? It must be 11 or above?

JHshen0124 avatar Aug 16 '22 02:08 JHshen0124

TRT8.4 still supports CUDA 10.2, did you download the CUDA 11 packages and use it in the CUDA 10 environment? image

zerollzeng avatar Aug 16 '22 03:08 zerollzeng

TRT8.4 still supports CUDA 10.2, did you download the CUDA 11 packages and use it in the CUDA 10 environment? image

I do use TensorRT based on CUDA 10.2. My package name is TensorRT-8.4.1.5.Linux.x86_64-gnu.cuda-10.2.cudnn8.4.tar.gz. I will try a new CUDA 11 image to see if it works. I will leave a message here later.

JHshen0124 avatar Aug 16 '22 05:08 JHshen0124

Closing since no activity for more than 3 weeks, thanks!

ttyio avatar May 02 '23 01:05 ttyio