TensorRT
TensorRT copied to clipboard
QAT onnx model convert to trt engine failed
Failed when convert ResNet50 QAT onnx model to trt using trtexec.
Log: [08/12/2022-03:53:50] [V] [TRT] =============== Computing costs for [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(57600,900,30,1) -> Int8(57600,900,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(14400,900:4,30,1) -> Int8(14400,900:4,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaDepthwiseConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (FusedConvActConvolution) [08/12/2022-03:53:50] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(14400,900:4,30,1) -> Int8(1800,900:32,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] *************** Autotuning format combination: Int8(1800,900:32,30,1) -> Int8(1800,900:32,30,1) *************** [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaGroupConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaGroupConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CudaDepthwiseConvolution) [08/12/2022-03:53:50] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (FusedConvActConvolution) [08/12/2022-03:53:50] [V] [TRT] FusedConvActConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] --------------- Timing Runner: layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22 (CaskConvolution) [08/12/2022-03:53:50] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [08/12/2022-03:53:50] [V] [TRT] Deleting timing cache: 81 entries, served 345 hits since creation. [08/12/2022-03:53:50] [E] Error[10]: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node layer1.0.conv1.weight + QuantizeLinear_20 + Conv_22.) [08/12/2022-03:53:50] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Command:
./trtexec --onnx=/xhzy/int8_test/best_coral_age.onnx --verbose --noDataTransfers --separateProfileRun --dumpProfile --useCudaGraph --int8
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: GeForce RTX 3090 NVIDIA Driver Version: 460.91.03 CUDA Version: 10.2 CUDNN Version: 8.4.0 Operating System: Ubuntu18.04 Python Version (if applicable): 3.6.9 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.9.0 Baremetal or Container (if so, version):
Can you share the onnx model here?
Can you share the onnx model here? https://drive.google.com/file/d/1MkE-0DFlcsajfJVY8iXKq-O_qGn3m9ah/view?usp=sharing
I can not reproduce this on my RTX 8000, TRT 8.4.1.5 with the official TRT docker image: nvcr.io/nvidia/tensorrt:22.07-py3, looks like your driver is pretty old, can you try upgrading your driver first? Or try using docker to check whether this can be reproduced.
[08/15/2022-12:13:10] [I] === Performance summary ===
[08/15/2022-12:13:10] [I] Throughput: 1761.79 qps
[08/15/2022-12:13:10] [I] Latency: min = 0.584717 ms, max = 0.734772 ms, mean = 0.589466 ms, median = 0.588562 ms, percentile(99%) = 0.598145 ms
[08/15/2022-12:13:10] [I] Enqueue Time: min = 0.166748 ms, max = 0.460693 ms, mean = 0.205419 ms, median = 0.200317 ms, percentile(99%) = 0.326721 ms
[08/15/2022-12:13:10] [I] H2D Latency: min = 0.0185547 ms, max = 0.0383301 ms, mean = 0.0200844 ms, median = 0.0198975 ms, percentile(99%) = 0.0230713 ms
[08/15/2022-12:13:10] [I] GPU Compute Time: min = 0.561035 ms, max = 0.70813 ms, mean = 0.564955 ms, median = 0.564209 ms, percentile(99%) = 0.572083 ms
[08/15/2022-12:13:10] [I] D2H Latency: min = 0.00366211 ms, max = 0.0151367 ms, mean = 0.00442618 ms, median = 0.00415039 ms, percentile(99%) = 0.00952148 ms
[08/15/2022-12:13:10] [I] Total Host Walltime: 3.00206 s
[08/15/2022-12:13:10] [I] Total GPU Compute Time: 2.98805 s
[08/15/2022-12:13:10] [W] * GPU compute time is unstable, with coefficient of variance = 1.46271%.
[08/15/2022-12:13:10] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[08/15/2022-12:13:10] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/15/2022-12:13:10] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=my_model.onnx --int8 --fp16
I can not reproduce this on my RTX 8000, TRT 8.4.1.5 with the official TRT docker image: nvcr.io/nvidia/tensorrt:22.07-py3, looks like your driver is pretty old, can you try upgrading your driver first? Or try using docker to check whether this can be reproduced.
[08/15/2022-12:13:10] [I] === Performance summary === [08/15/2022-12:13:10] [I] Throughput: 1761.79 qps [08/15/2022-12:13:10] [I] Latency: min = 0.584717 ms, max = 0.734772 ms, mean = 0.589466 ms, median = 0.588562 ms, percentile(99%) = 0.598145 ms [08/15/2022-12:13:10] [I] Enqueue Time: min = 0.166748 ms, max = 0.460693 ms, mean = 0.205419 ms, median = 0.200317 ms, percentile(99%) = 0.326721 ms [08/15/2022-12:13:10] [I] H2D Latency: min = 0.0185547 ms, max = 0.0383301 ms, mean = 0.0200844 ms, median = 0.0198975 ms, percentile(99%) = 0.0230713 ms [08/15/2022-12:13:10] [I] GPU Compute Time: min = 0.561035 ms, max = 0.70813 ms, mean = 0.564955 ms, median = 0.564209 ms, percentile(99%) = 0.572083 ms [08/15/2022-12:13:10] [I] D2H Latency: min = 0.00366211 ms, max = 0.0151367 ms, mean = 0.00442618 ms, median = 0.00415039 ms, percentile(99%) = 0.00952148 ms [08/15/2022-12:13:10] [I] Total Host Walltime: 3.00206 s [08/15/2022-12:13:10] [I] Total GPU Compute Time: 2.98805 s [08/15/2022-12:13:10] [W] * GPU compute time is unstable, with coefficient of variance = 1.46271%. [08/15/2022-12:13:10] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability. [08/15/2022-12:13:10] [I] Explanations of the performance metrics are printed in the verbose logs. [08/15/2022-12:13:10] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=my_model.onnx --int8 --fp16
I used your image to try and it works. Thanks. My image is based on nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04 ,maybe the problem is cuda version? It must be 11 or above?
TRT8.4 still supports CUDA 10.2, did you download the CUDA 11 packages and use it in the CUDA 10 environment?

TRT8.4 still supports CUDA 10.2, did you download the CUDA 11 packages and use it in the CUDA 10 environment?
I do use TensorRT based on CUDA 10.2. My package name is TensorRT-8.4.1.5.Linux.x86_64-gnu.cuda-10.2.cudnn8.4.tar.gz. I will try a new CUDA 11 image to see if it works. I will leave a message here later.
Closing since no activity for more than 3 weeks, thanks!