mmdeploy deploy instance-seg_dynamic with export_postprocess

Describe the bug

2022-08-30 06:01:25,337 - mmdeploy - INFO - Execute onnx optimize passes. 2022-08-30 06:01:26,701 - mmdeploy - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx [2022-08-30 06:01:29.267] [mmdeploy] [info] [model.cpp:95] Register 'DirectoryModel' 2022-08-30 06:01:29,272 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess 2022-08-30 06:01:29,437 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /root/workspace/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so [08/30/2022-06:01:29] [TRT] [I] [MemUsageChange] Init CUDA: CPU +313, GPU +0, now: CPU 400, GPU 6306 (MiB) [08/30/2022-06:01:30] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 400 MiB, GPU 6306 MiB [08/30/2022-06:01:30] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 535 MiB, GPU 6340 MiB [08/30/2022-06:01:31] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [08/30/2022-06:01:31] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [08/30/2022-06:15:49] [TRT] [I] No importer registered for op: TRTBatchedNMS. Attempting to import as plugin. [08/30/2022-06:15:49] [TRT] [I] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace: [08/30/2022-06:15:49] [TRT] [I] Successfully created plugin: TRTBatchedNMS [08/30/2022-06:16:09] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin. [08/30/2022-06:16:09] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace: [08/30/2022-06:16:09] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign [08/30/2022-06:16:58] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [08/30/2022-06:17:08] [TRT] [I] No importer registered for op: TRTBatchedNMS. Attempting to import as plugin. [08/30/2022-06:17:08] [TRT] [I] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace: [08/30/2022-06:17:08] [TRT] [I] Successfully created plugin: TRTBatchedNMS [08/30/2022-06:17:32] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin. [08/30/2022-06:17:32] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace: [08/30/2022-06:17:32] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign [08/30/2022-06:18:27] [TRT] [I] No importer registered for op: grid_sampler. Attempting to import as plugin. [08/30/2022-06:18:27] [TRT] [I] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace: [08/30/2022-06:18:27] [TRT] [I] Successfully created plugin: grid_sampler [08/30/2022-06:18:30] [TRT] [W] Output type must be INT32 for shape outputs [08/30/2022-06:18:30] [TRT] [W] Output type must be INT32 for shape outputs [08/30/2022-06:18:30] [TRT] [W] Output type must be INT32 for shape outputs [08/30/2022-06:18:39] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.5.1 [08/30/2022-06:18:39] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +483, GPU +206, now: CPU 1411, GPU 6546 (MiB) [08/30/2022-06:18:39] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +115, GPU +54, now: CPU 1526, GPU 6600 (MiB) [08/30/2022-06:18:39] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [08/30/2022-06:18:39] [TRT] [E] 4: [shapeCompiler.cpp::evaluateShapeChecks::832] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_3019: reshaping failed for tensor: 6596) Process Process-3: Traceback (most recent call last): File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in call ret = func(*args, **kwargs) File "/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 79, in onnx2tensorrt from_onnx( File "/root/workspace/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 153, in from_onnx assert engine is not None, 'Failed to create TensorRT engine' AssertionError: Failed to create TensorRT engine 2022-08-30 06:18:39,890 - mmdeploy - ERROR - mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt with Call id: 1 failed. exit.

Reproduction

python tools/deploy.py
configs/mmdet/instance-seg/instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py /mydata/mmdetection/configs/swin/mask_rcnn_swin-t-p4-w7_fpn_fp16_ms-crop-3x_coco.py /triton_data/test_model/mmdeploy/workspaces/mask_swin/mask_rcnn_swin-t-p4-w7_fpn_fp16_ms-crop-3x_coco_20210908_165006-90a4008c.pth /mydata/mmdetection/demo/demo.jpg --work-dir /triton_data/test_model/mmdeploy/workspaces/mask_swin/swin_postmask/ --device cuda --dump-info

instance-seg_tensorrt-fp16_dynamic-320x320-1344x1344.py config: base = [ '../base/base_instance-seg_dynamic.py', '../../base/backends/tensorrt-fp16.py' ]

backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 320, 320], opt_shape=[1, 3, 800, 1344], max_shape=[1, 3, 1344, 1344]))) ]) codebase_config = dict(post_processing=dict(export_postprocess_mask=True))

Environment

2022-08-30 06:37:28,908 - mmdeploy - INFO - Environmental information 2022-08-30 06:37:29,095 - mmdeploy - INFO - sys.platform: linux 2022-08-30 06:37:29,095 - mmdeploy - INFO - Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] 2022-08-30 06:37:29,095 - mmdeploy - INFO - CUDA available: True 2022-08-30 06:37:29,095 - mmdeploy - INFO - GPU 0: Tesla T4 2022-08-30 06:37:29,095 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda 2022-08-30 06:37:29,095 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124 2022-08-30 06:37:29,095 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 2022-08-30 06:37:29,095 - mmdeploy - INFO - PyTorch: 1.10.0 2022-08-30 06:37:29,095 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

2022-08-30 06:37:29,095 - mmdeploy - INFO - TorchVision: 0.11.0 2022-08-30 06:37:29,095 - mmdeploy - INFO - OpenCV: 4.6.0 2022-08-30 06:37:29,095 - mmdeploy - INFO - MMCV: 1.5.3 2022-08-30 06:37:29,095 - mmdeploy - INFO - MMCV Compiler: GCC 7.3 2022-08-30 06:37:29,095 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3 2022-08-30 06:37:29,095 - mmdeploy - INFO - MMDeploy: 0.7.0+21775ce 2022-08-30 06:37:29,095 - mmdeploy - INFO -

2022-08-30 06:37:29,095 - mmdeploy - INFO - Backend information 2022-08-30 06:37:29,638 - mmdeploy - INFO - onnxruntime: 1.8.1 ops_is_avaliable : True 2022-08-30 06:37:29,672 - mmdeploy - INFO - tensorrt: 8.2.4.2 ops_is_avaliable : True 2022-08-30 06:37:29,692 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2022-08-30 06:37:29,700 - mmdeploy - INFO - pplnn_is_avaliable: False 2022-08-30 06:37:29,702 - mmdeploy - INFO - openvino_is_avaliable: False 2022-08-30 06:37:29,724 - mmdeploy - INFO - snpe_is_available: False 2022-08-30 06:37:29,724 - mmdeploy - INFO -

2022-08-30 06:37:29,724 - mmdeploy - INFO - Codebase information 2022-08-30 06:37:29,726 - mmdeploy - INFO - mmdet: 2.25.1 2022-08-30 06:37:29,726 - mmdeploy - INFO - mmseg: None 2022-08-30 06:37:29,726 - mmdeploy - INFO - mmcls: 0.23.2 2022-08-30 06:37:29,727 - mmdeploy - INFO - mmocr: None 2022-08-30 06:37:29,727 - mmdeploy - INFO - mmedit: None 2022-08-30 06:37:29,727 - mmdeploy - INFO - mmdet3d: None 2022-08-30 06:37:29,727 - mmdeploy - INFO - mmpose: None 2022-08-30 06:37:29,727 - mmdeploy - INFO - mmrotate: None

Aug 30 '22 06:08 Chen-cyw

Thanks for reporting the bug. It occurred in all the instance-seg models of MMDeploy. Will fix it ASAP.

Aug 30 '22 08:08 AllentDan

After checking it with an NVIDIA card with larger memory, the error might be raised by a lack of enough mem.

Aug 30 '22 10:08 AllentDan

After checking it with an NVIDIA card with larger memory, the error might be raised by a lack of enough mem.

how much memory size is enough?

Aug 30 '22 11:08 Chen-cyw

I met the same error as yours with Mask-RCNN with export_postprocess_mask=True on a 1060 card with 6 GB mem. But the conversion succeeded on a 2080 Ti card with 8GB mem. So I speculated that the error might be raised by mem.

Aug 31 '22 01:08 AllentDan

I use T4 with 13GB mem, raise the same error

Aug 31 '22 06:08 Chen-cyw

I use T4 with 13GB mem, raise the same error

Got you. Static configs are recommended if you want to use export_postprocess_mask=True.

Aug 31 '22 06:08 AllentDan

static configs can convert successed, maybe mark this problem in user guide

Aug 31 '22 06:08 Chen-cyw

Thanks for the suggestion. We will try it.

Aug 31 '22 06:08 AllentDan

I use T4 with 13GB mem, raise the same error

can you solve this problem? we are same error

Sep 03 '22 04:09 machine52vision

I use T4 with 13GB mem, raise the same error

can you solve this problem? we are same error

no, I use the static config before slove it

Sep 05 '22 01:09 Chen-cyw

mmdeploy
mmdeploy copied to clipboard

deploy instance-seg_dynamic with export_postprocess_mask=True error

mmdeploy mmdeploy copied to clipboard

deploy instance-seg_dynamic with export_postprocess_mask=True error

mmdeploy
mmdeploy copied to clipboard