mmdeploy icon indicating copy to clipboard operation
mmdeploy copied to clipboard

ONNX Runtime convert to TensorRT: fp32 & fp16 mode success but int8 mode fail

Open zml24 opened this issue 3 years ago • 7 comments

Checklist

  • [X] I have searched related issues but cannot get the expected help.
  • [X] 2. I have read the FAQ documentation but cannot get the expected help.
  • [X] 3. The bug has not been fixed in the latest version.

Describe the bug

ONNX Runtime convert to TensorRT: fp32 & fp16 mode success but int8 mode fail

Reproduction

change faster-rcnn to fast-rcnn (with pre computed), code can be run using mmdetection mmdeploy(tensorrt fp32 & fp16), mmdeploy(onnxruntime).

Environment

TensorRT Version: 8.2.3.0
NVIDIA GPU: T4
NVIDIA Driver Version: 450.102.04
CUDA Version: 11.0
CUDNN Version: 8302 (torch.backends.cudnn.version())
Operating System: CentOS
Python Version (if applicable): 3.8.13
Tensorflow Version (if applicable): -
PyTorch Version (if applicable): 1.12.1
Baremetal or Container (if so, version): -

Error traceback

For fp16 (trt_log_level: trt.Logger.INFO)
2022-09-08 17:36:02,762 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess
2022-09-08 17:36:03,426 - mmdeploy - INFO - Successfully loaded tensorrt plugins from xxx/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[09/08/2022-17:36:08] [TRT] [I] [MemUsageChange] Init CUDA: CPU +319, GPU +0, now: CPU 400, GPU 1354 (MiB)
[09/08/2022-17:36:09] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 399 MiB, GPU 1354 MiB
[09/08/2022-17:36:09] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 534 MiB, GPU 1388 MiB
[09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/08/2022-17:36:10] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin.
[09/08/2022-17:36:10] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace:
[09/08/2022-17:36:10] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign
[09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/08/2022-17:36:10] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/08/2022-17:36:10] [TRT] [W] Output type must be INT32 for shape outputs
[09/08/2022-17:36:10] [TRT] [W] Output type must be INT32 for shape outputs
[09/08/2022-17:36:12] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1
[09/08/2022-17:36:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +489, GPU +206, now: CPU 1335, GPU 1594 (MiB)
[09/08/2022-17:36:13] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 1452, GPU 1646 (MiB)
[09/08/2022-17:36:13] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/08/2022-17:37:22] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[09/08/2022-17:38:22] [TRT] [I] Detected 3 inputs and 2 output network tensors.
[09/08/2022-17:38:22] [TRT] [I] Total Host Persistent Memory: 148672
[09/08/2022-17:38:22] [TRT] [I] Total Device Persistent Memory: 61259264
[09/08/2022-17:38:22] [TRT] [I] Total Scratch Memory: 1024000
[09/08/2022-17:38:22] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 100 MiB, GPU 1237 MiB
[09/08/2022-17:38:22] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 14.9649ms to assign 10 blocks to 97 nodes requiring 264632321 bytes.
[09/08/2022-17:38:22] [TRT] [I] Total Activation Memory: 264632321
[09/08/2022-17:38:22] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1
[09/08/2022-17:38:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2342, GPU 2108 (MiB)
[09/08/2022-17:38:22] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2342, GPU 2116 (MiB)
[09/08/2022-17:38:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +78, GPU +128, now: CPU 78, GPU 128 (MiB)
2022-09-08 17:38:23,647 - mmdeploy - INFO - Finish pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt
2022-09-08 17:38:25,262 - mmdeploy - WARNING - "visualize_model" has been skipped may be because it's running on a headless device.
2022-09-08 17:38:25,262 - mmdeploy - INFO - All process success.

For int8 (trt_log_level: trt.Logger.INFO)
2022-09-08 17:41:42,477 - mmdeploy - INFO - Start pipeline mmdeploy.apis.calibration.create_calib_input_data in subprocess
load checkpoint from local path: work_dirs/fast_rcnn_r50_fpn_fp16_1x_align/v1.3.pth
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
100%█████████████████████████████████████████ 248/248 [01:15<00:00, 3.27it/s]
2022-09-08 17:43:23,743 - mmdeploy - INFO - Finish pipeline mmdeploy.apis.calibration.create_calib_input_data
2022-09-08 17:43:44,359 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess
2022-09-08 17:43:45,070 - mmdeploy - INFO - Successfully loaded tensorrt plugins from xxx/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[09/08/2022-17:43:49] [TRT] [I] [MemUsageChange] Init CUDA: CPU +319, GPU +0, now: CPU 400, GPU 1354 (MiB)
[09/08/2022-17:43:50] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 400 MiB, GPU 1354 MiB
[09/08/2022-17:43:51] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 534 MiB, GPU 1388 MiB
[09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/08/2022-17:43:52] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin.
[09/08/2022-17:43:52] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace:
[09/08/2022-17:43:52] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign
[09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/08/2022-17:43:52] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[09/08/2022-17:43:52] [TRT] [W] Output type must be INT32 for shape outputs
[09/08/2022-17:43:52] [TRT] [W] Output type must be INT32 for shape outputs
[09/08/2022-17:43:54] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1
[09/08/2022-17:43:54] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +486, GPU +206, now: CPU 1336, GPU 1594 (MiB)
[09/08/2022-17:43:55] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 1453, GPU 1646 (MiB)
[09/08/2022-17:43:55] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[09/08/2022-17:43:55] [TRT] [W] Calibration Profile is not defined. Running calibration with Profile 0
[09/08/2022-17:43:55] [TRT] [W] Calibration Profile is not defined. Running calibration with Profile 0
[09/08/2022-17:44:06] [TRT] [I] Detected 2 inputs and 2 output network tensors.
[09/08/2022-17:44:06] [TRT] [I] Total Host Persistent Memory: 114848
[09/08/2022-17:44:06] [TRT] [I] Total Device Persistent Memory: 0
[09/08/2022-17:44:06] [TRT] [I] Total Scratch Memory: 1600
[09/08/2022-17:44:06] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 384 MiB
[09/08/2022-17:44:06] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 38.0418ms to assign 8 blocks to 172 nodes requiring 263425024 bytes.
[09/08/2022-17:44:06] [TRT] [I] Total Activation Memory: 263425024
[09/08/2022-17:44:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1
[09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2254, GPU 2234 (MiB)
[09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2254, GPU 2242 (MiB)
[09/08/2022-17:44:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1
[09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2254, GPU 2218 (MiB)
[09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2254, GPU 2226 (MiB)
[09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +251, now: CPU 0, GPU 507 (MiB)
[09/08/2022-17:44:06] [TRT] [I] Starting Calibration.
2022-09-08 17:44:07,139 - mmdeploy - ERROR - `mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt` with Call id: 2 failed. exit.

zml24 avatar Sep 09 '22 09:09 zml24

Hi, how to set int8 for tensorrt ? I don't find the parameter.

JinqingZhengTju avatar Sep 13 '22 13:09 JinqingZhengTju

Hi, how to set int8 for tensorrt ? I don't find the parameter.

fp32: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py fp16: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py int8: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-int8_dynamic-320x320-1344x1344.py You can adjust final dtype by setting different backend configs

zml24 avatar Sep 13 '22 13:09 zml24

Hi, how to set int8 for tensorrt ? I don't find the parameter.

fp32: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py fp16: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py int8: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-int8_dynamic-320x320-1344x1344.py You can adjust final dtype by setting different backend configs

For mmrotate task, there isn't a configuration file for int8. Can I write an int8 configuration file for mmrotate task? I don't whether this toolbox supports this kind of configuration for mmrotate task.

JinqingZhengTju avatar Sep 13 '22 13:09 JinqingZhengTju

Hi, how to set int8 for tensorrt ? I don't find the parameter.

fp32: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py fp16: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py int8: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-int8_dynamic-320x320-1344x1344.py You can adjust final dtype by setting different backend configs

For mmrotate task, there isn't a configuration file for int8. Can I write an int8 configuration file for mmrotate task? I don't whether this toolbox supports this kind of configuration for mmrotate task.

I think you can directly apply int8 changes to mmrotate fp16 config. Good luck.

zml24 avatar Sep 14 '22 03:09 zml24

Hi, how to set int8 for tensorrt ? I don't find the parameter.

fp32: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py fp16: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py int8: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-int8_dynamic-320x320-1344x1344.py You can adjust final dtype by setting different backend configs

For mmrotate task, there isn't a configuration file for int8. Can I write an int8 configuration file for mmrotate task? I don't whether this toolbox supports this kind of configuration for mmrotate task.

I think you can directly apply int8 changes to mmrotate fp16 config. Good luck.

Thanks. I will try it.

JinqingZhengTju avatar Sep 14 '22 03:09 JinqingZhengTju

Hi, how to set int8 for tensorrt ? I don't find the parameter.

fp32: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py fp16: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py int8: https://github.com/open-mmlab/mmdeploy/blob/master/configs/mmdet/instance-seg/instance-seg_tensorrt-int8_dynamic-320x320-1344x1344.py You can adjust final dtype by setting different backend configs

For mmrotate task, there isn't a configuration file for int8. Can I write an int8 configuration file for mmrotate task? I don't whether this toolbox supports this kind of configuration for mmrotate task.

I think you can directly apply int8 changes to mmrotate fp16 config. Good luck.

I have tried this kind of modification. But I have this problem. When calling this function 'create_calib_input_data' , the error happens: ArrtributeError:'DataContainer' object has no attribute 'detach'. The log message is as follows:

0%| | 0/40 [00:00<?, ?it/s][2022-09-14 22:52:07.137] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' 0%| | 0/40 [00:01<?, ?it/s] Process Process-3: Traceback (most recent call last): File "/home/zjq/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/zjq/anaconda3/envs/open-mmlab/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/calibration.py", line 82, in create_calib_input_data device=device) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in wrap return self.call_function(func_name, *args, **kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function return self.call_function_local(func_name, *args, **kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local return pipe_caller(*args, **kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/home/zjq/open-mmlab/MMDeploy/mmdeploy/apis/utils/calibration.py", line 67, in create_calib_input_data input_ndarray = input_tensor.detach().cpu().numpy() AttributeError: 'DataContainer' object has no attribute 'detach' 2022-09-14 22:52:08,470 - mmdeploy - ERROR - mmdeploy.apis.calibration.create_calib_input_data with Call id: 1 failed. exit.

JinqingZhengTju avatar Sep 14 '22 15:09 JinqingZhengTju

Similar to #1048. May try PR #1050

lvhan028 avatar Sep 15 '22 14:09 lvhan028