mmdeploy icon indicating copy to clipboard operation
mmdeploy copied to clipboard

Could not find any implementation for node MaxPool on Jetson NX

Open tongda opened this issue 1 year ago • 15 comments

Checklist

  • [X] I have searched related issues but cannot get the expected help.
  • [X] 2. I have read the FAQ documentation but cannot get the expected help.
  • [X] 3. The bug has not been fixed in the latest version.

Describe the bug

Jetson Xavier NX Jetpack: 4.6.1 CUDA: 10.2 TensorRT: 8.2.1.8

Converting mmdet model yolox raise exception: "Could not find any implementation for node MaxPool_102."

Reproduction

python ./tools/deploy.py configs/mmdet/detection/base_tensorrt_static-640x640.py yolox_s_8x8_300e_coco.py yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth test.jpg --work-dir ./work-dir/ --device cuda:0 --dump-info

mmdeploy config as follow:

_base_ = ['../_base_/base_static.py', '../../_base_/backends/tensorrt.py']

onnx_config = dict(input_shape=(640, 640))

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 640, 640],
                    opt_shape=[1, 3, 640, 640],
                    max_shape=[1, 3, 640, 640])))
    ])

mmdet model is the official yolox model.

Environment

2022-09-16 07:00:29,768 - mmdeploy - INFO -

2022-09-16 07:00:29,768 - mmdeploy - INFO - **********Environmental information**********
2022-09-16 07:00:30,823 - mmdeploy - INFO - sys.platform: linux
2022-09-16 07:00:30,824 - mmdeploy - INFO - Python: 3.6.15 | packaged by conda-forge | (default, Dec  3 2021, 19:12:04) [GCC 9.4.0]
2022-09-16 07:00:30,825 - mmdeploy - INFO - CUDA available: True
2022-09-16 07:00:30,825 - mmdeploy - INFO - GPU 0: Xavier
2022-09-16 07:00:30,825 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda-10.2
2022-09-16 07:00:30,826 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.300
2022-09-16 07:00:30,826 - mmdeploy - INFO - GCC: gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
2022-09-16 07:00:30,826 - mmdeploy - INFO - PyTorch: 1.10.0
2022-09-16 07:00:30,827 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.2.1
    - Built with CuDNN 8.0
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=8.0.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON,

2022-09-16 07:00:30,827 - mmdeploy - INFO - TorchVision: 0.11.1
2022-09-16 07:00:30,828 - mmdeploy - INFO - OpenCV: 4.6.0
2022-09-16 07:00:30,828 - mmdeploy - INFO - MMCV: 1.6.1
2022-09-16 07:00:30,828 - mmdeploy - INFO - MMCV Compiler: GCC 7.5
2022-09-16 07:00:30,829 - mmdeploy - INFO - MMCV CUDA Compiler: 10.2
2022-09-16 07:00:30,829 - mmdeploy - INFO - MMDeploy: 0.8.0+a1a19f0
2022-09-16 07:00:30,829 - mmdeploy - INFO -

2022-09-16 07:00:30,829 - mmdeploy - INFO - **********Backend information**********
2022-09-16 07:00:33,716 - mmdeploy - INFO - onnxruntime: 1.10.0 ops_is_avaliable : False
2022-09-16 07:00:33,883 - mmdeploy - INFO - tensorrt: 8.2.1.8   ops_is_avaliable : True
2022-09-16 07:00:33,986 - mmdeploy - INFO - ncnn: None  ops_is_avaliable : False
2022-09-16 07:00:33,994 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-09-16 07:00:34,002 - mmdeploy - INFO - openvino_is_avaliable: False
2022-09-16 07:00:34,119 - mmdeploy - INFO - snpe_is_available: False
2022-09-16 07:00:34,131 - mmdeploy - INFO - ascend_is_available: False
2022-09-16 07:00:34,138 - mmdeploy - INFO - coreml_is_available: False
2022-09-16 07:00:34,139 - mmdeploy - INFO -

2022-09-16 07:00:34,139 - mmdeploy - INFO - **********Codebase information**********
2022-09-16 07:00:34,149 - mmdeploy - INFO - mmdet:      2.25.1
2022-09-16 07:00:34,149 - mmdeploy - INFO - mmseg:      None
2022-09-16 07:00:34,150 - mmdeploy - INFO - mmcls:      None
2022-09-16 07:00:34,150 - mmdeploy - INFO - mmocr:      None
2022-09-16 07:00:34,150 - mmdeploy - INFO - mmedit:     None
2022-09-16 07:00:34,151 - mmdeploy - INFO - mmdet3d:    None
2022-09-16 07:00:34,151 - mmdeploy - INFO - mmpose:     0.28.1
2022-09-16 07:00:34,151 - mmdeploy - INFO - mmrotate:   None


### Error traceback

```Shell
2022-09-16 05:40:46,447 - mmdeploy - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
load checkpoint from local path: ../action-api/actionloop/engines/mmcfgs/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth
2022-09-16 05:40:58,194 - mmdeploy - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future.
2022-09-16 05:40:58,195 - mmdeploy - INFO - Export PyTorch model to ONNX: ./work-dir/obj-dynamic4/end2end.onnx.
2022-09-16 05:40:58,393 - mmdeploy - WARNING - Can not find torch._C._jit_pass_onnx_deduplicate_initializers, function rewrite will not be applied
/home/nvidia/mmdeploy/mmdeploy/core/optimizers/function_marker.py:158: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/home/nvidia/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py:24: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  img_shape = [int(val) for val in img_shape]
/home/nvidia/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py:24: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  img_shape = [int(val) for val in img_shape]
/home/nvidia/archiconda3/envs/mmdeploy/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /media/nvidia/NVME/pytorch/pytorch-v1.10.0/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/nvidia/mmdeploy/mmdeploy/codebase/mmdet/core/post_processing/bbox_nms.py:260: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  dets, labels = TRTBatchedNMSop.apply(boxes, scores, int(scores.shape[-1]),
/home/nvidia/mmdeploy/mmdeploy/mmcv/ops/nms.py:178: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  out_boxes = min(num_boxes, after_topk)
/home/nvidia/mmdeploy/mmdeploy/mmcv/ops/nms.py:181: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  (batch_size, out_boxes)).to(scores.device))
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mmdeploy::TRTBatchedNMS type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
2022-09-16 05:41:30,497 - mmdeploy - INFO - Execute onnx optimize passes.
2022-09-16 05:41:32,095 - mmdeploy - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
2022-09-16 05:41:42,141 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess
2022-09-16 05:41:42,652 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /home/nvidia/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[09/16/2022-05:41:44] [TRT] [I] [MemUsageChange] Init CUDA: CPU +355, GPU +0, now: CPU 441, GPU 5334 (MiB)
[09/16/2022-05:41:45] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 441 MiB, GPU 5364 MiB
[09/16/2022-05:41:45] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 546 MiB, GPU 5471 MiB
[09/16/2022-05:41:46] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/16/2022-05:41:46] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/16/2022-05:41:46] [TRT] [I] No importer registered for op: TRTBatchedNMS. Attempting to import as plugin.
[09/16/2022-05:41:46] [TRT] [I] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace:
[09/16/2022-05:41:46] [TRT] [I] Successfully created plugin: TRTBatchedNMS
[09/16/2022-05:41:46] [TRT] [I] ---------- Layers Running on DLA ----------
[09/16/2022-05:41:46] [TRT] [I] ---------- Layers Running on GPU ----------
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Reshape_0 + Transpose_1
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Reshape_2
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_3
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_4), Mul_5)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_6
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_7), Mul_8)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_12 || Conv_9
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_13), Mul_14)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_15
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_16), Mul_17)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_18
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_10), Mul_11)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_19), Mul_20), Add_21)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_23
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_24), Mul_25)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_26
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_27), Mul_28)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_32 || Conv_29
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_33), Mul_34)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_35
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_36), Mul_37)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_38
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_39), Mul_40), Add_41)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_42
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_43), Mul_44)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_45
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_46), Mul_47), Add_48)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_49
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_50), Mul_51)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_52
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_30), Mul_31)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_53), Mul_54), Add_55)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_57
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_58), Mul_59)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_60
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_61), Mul_62)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_66 || Conv_63
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_67), Mul_68)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_69
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_70), Mul_71)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_72
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_73), Mul_74), Add_75)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_76
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_77), Mul_78)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_79
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_80), Mul_81), Add_82)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_83
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_84), Mul_85)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_86
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_64), Mul_65)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(PWN(Sigmoid_87), Mul_88), Add_89)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_91
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_92), Mul_93)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_94
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_95), Mul_96)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_97
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_98), Mul_99)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] MaxPool_102
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] MaxPool_101
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] MaxPool_100
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 622 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 623 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 624 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 625 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_104
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_105), Mul_106)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_110 || Conv_107
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_111), Mul_112)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_113
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_114), Mul_115)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_116
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_108), Mul_109)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_117), Mul_118)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_120
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_121), Mul_122)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_123
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_124), Mul_125)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Resize_127
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 660 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_132 || Conv_129
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_133), Mul_134)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_135
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_136), Mul_137)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_138
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_130), Mul_131)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_139), Mul_140)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_142
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_143), Mul_144)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_145
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_146), Mul_147)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Resize_148
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 691 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_153 || Conv_150
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_154), Mul_155)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_156
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_157), Mul_158)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_159
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_151), Mul_152)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_160), Mul_161)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_163
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_164), Mul_165)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_206
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_166
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_207), Mul_208)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_167), Mul_168)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 686 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_173 || Conv_170
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_221 || Conv_215
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_174), Mul_175)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_222), Mul_223)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_224
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_176
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_225), Mul_226)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_177), Mul_178)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_179
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_228
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_171), Mul_172)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_180), Mul_181)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_183
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_184), Mul_185)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_209
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_186
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_210), Mul_211)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_187), Mul_188)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 655 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_193 || Conv_190
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_236 || Conv_230
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_194), Mul_195)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_237), Mul_238)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_239
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_196
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_240), Mul_241)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_197), Mul_198)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_199
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_243
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_191), Mul_192)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_200), Mul_201)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_203
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_204), Mul_205)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_212
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_213), Mul_214)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_251 || Conv_245
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_252), Mul_253)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_254
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_255), Mul_256)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_258
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] {ForeignNode[Transpose_291 + Reshape_292...Unsqueeze_340]}
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_216), Mul_217)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_218
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_219), Mul_220)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_229
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_227
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_231), Mul_232)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_233
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_234), Mul_235)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_244
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_242
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_246), Mul_247)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_248
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_249), Mul_250)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_259
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Conv_257
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_285 + Reshape_286
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_287 + Reshape_288
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_289 + Reshape_290
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 930 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 938 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] 946 copy
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_297 + Reshape_298
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_299 + Reshape_300
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Transpose_301 + Reshape_302
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(Sigmoid_306)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] Unsqueeze_338
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] PWN(PWN(Sigmoid_304), Mul_339)
[09/16/2022-05:41:46] [TRT] [I] [GpuLayer] TRTBatchedNMS_341
[09/16/2022-05:41:48] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +226, now: CPU 849, GPU 5776 (MiB)
[09/16/2022-05:41:48] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
1[09/16/2022-05:44:00] [TRT] [E] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node MaxPool_102.)
Process Process-3:
Traceback (most recent call last):
  File "/home/nvidia/archiconda3/envs/mmdeploy/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/nvidia/archiconda3/envs/mmdeploy/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nvidia/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/nvidia/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 88, in onnx2tensorrt
    device_id=device_id)
  File "/home/nvidia/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 215, in from_onnx
    assert engine is not None, 'Failed to create TensorRT engine'
AssertionError: Failed to create TensorRT engine
2022-09-16 05:44:01,456 - mmdeploy - ERROR - `mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt` with Call id: 1 failed. exit.

tongda avatar Sep 16 '22 11:09 tongda

@lvhan028

tpoisonooo avatar Sep 16 '22 11:09 tpoisonooo

running trtexec directly as follow:

/usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEng
ine=end2end.engine

got output:

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEngine=end2end.engine
[09/16/2022-10:31:55] [I] === Model Options ===
[09/16/2022-10:31:55] [I] Format: ONNX
[09/16/2022-10:31:55] [I] Model: ./end2end.onnx
[09/16/2022-10:31:55] [I] Output:
[09/16/2022-10:31:55] [I] === Build Options ===
[09/16/2022-10:31:55] [I] Max batch: explicit batch
[09/16/2022-10:31:55] [I] Workspace: 6000 MiB
[09/16/2022-10:31:55] [I] minTiming: 1
[09/16/2022-10:31:55] [I] avgTiming: 8
[09/16/2022-10:31:55] [I] Precision: FP32+FP16
[09/16/2022-10:31:55] [I] Calibration:
[09/16/2022-10:31:55] [I] Refit: Disabled
[09/16/2022-10:31:55] [I] Sparsity: Disabled
[09/16/2022-10:31:55] [I] Safe mode: Disabled
[09/16/2022-10:31:55] [I] DirectIO mode: Disabled
[09/16/2022-10:31:55] [I] Restricted mode: Disabled
[09/16/2022-10:31:55] [I] Save engine: end2end.engine
[09/16/2022-10:31:55] [I] Load engine:
[09/16/2022-10:31:55] [I] Profiling verbosity: 0
[09/16/2022-10:31:55] [I] Tactic sources: Using default tactic sources
[09/16/2022-10:31:55] [I] timingCacheMode: local
[09/16/2022-10:31:55] [I] timingCacheFile:
[09/16/2022-10:31:55] [I] Input(s)s format: fp32:CHW
[09/16/2022-10:31:55] [I] Output(s)s format: fp32:CHW
[09/16/2022-10:31:55] [I] Input build shapes: model
[09/16/2022-10:31:55] [I] Input calibration shapes: model
[09/16/2022-10:31:55] [I] === System Options ===
[09/16/2022-10:31:55] [I] Device: 0
[09/16/2022-10:31:55] [I] DLACore:
[09/16/2022-10:31:55] [I] Plugins: ../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[09/16/2022-10:31:55] [I] === Inference Options ===
[09/16/2022-10:31:55] [I] Batch: Explicit
[09/16/2022-10:31:55] [I] Input inference shapes: model
[09/16/2022-10:31:55] [I] Iterations: 10
[09/16/2022-10:31:55] [I] Duration: 3s (+ 200ms warm up)
[09/16/2022-10:31:55] [I] Sleep time: 0ms
[09/16/2022-10:31:55] [I] Idle time: 0ms
[09/16/2022-10:31:55] [I] Streams: 1
[09/16/2022-10:31:55] [I] ExposeDMA: Disabled
[09/16/2022-10:31:55] [I] Data transfers: Enabled
[09/16/2022-10:31:55] [I] Spin-wait: Disabled
[09/16/2022-10:31:55] [I] Multithreading: Disabled
[09/16/2022-10:31:55] [I] CUDA Graph: Disabled
[09/16/2022-10:31:55] [I] Separate profiling: Disabled
[09/16/2022-10:31:55] [I] Time Deserialize: Disabled
[09/16/2022-10:31:55] [I] Time Refit: Disabled
[09/16/2022-10:31:55] [I] Skip inference: Disabled
[09/16/2022-10:31:55] [I] Inputs:
[09/16/2022-10:31:55] [I] === Reporting Options ===
[09/16/2022-10:31:55] [I] Verbose: Disabled
[09/16/2022-10:31:55] [I] Averages: 10 inferences
[09/16/2022-10:31:55] [I] Percentile: 99
[09/16/2022-10:31:55] [I] Dump refittable layers:Disabled
[09/16/2022-10:31:55] [I] Dump output: Disabled
[09/16/2022-10:31:55] [I] Profile: Disabled
[09/16/2022-10:31:55] [I] Export timing to JSON file:
[09/16/2022-10:31:55] [I] Export output to JSON file:
[09/16/2022-10:31:55] [I] Export profile to JSON file:
[09/16/2022-10:31:55] [I]
[09/16/2022-10:31:55] [I] === Device Information ===
[09/16/2022-10:31:55] [I] Selected Device: Xavier
[09/16/2022-10:31:55] [I] Compute Capability: 7.2
[09/16/2022-10:31:55] [I] SMs: 6
[09/16/2022-10:31:55] [I] Compute Clock Rate: 1.109 GHz
[09/16/2022-10:31:55] [I] Device Global Memory: 7772 MiB
[09/16/2022-10:31:55] [I] Shared Memory per SM: 96 KiB
[09/16/2022-10:31:55] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/16/2022-10:31:55] [I] Memory Clock Rate: 1.109 GHz
[09/16/2022-10:31:55] [I]
[09/16/2022-10:31:55] [I] TensorRT version: 8.2.1
[09/16/2022-10:31:55] [I] Loading supplied plugin library: ../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so
[09/16/2022-10:31:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +362, GPU +0, now: CPU 381, GPU 3771 (MiB)
[09/16/2022-10:31:58] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 381 MiB, GPU 3771 MiB
[09/16/2022-10:31:58] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 486 MiB, GPU 3877 MiB
[09/16/2022-10:31:58] [I] Start parsing network model
[09/16/2022-10:31:59] [I] [TRT] ----------------------------------------------------------------
[09/16/2022-10:31:59] [I] [TRT] Input filename:   ./end2end.onnx
[09/16/2022-10:31:59] [I] [TRT] ONNX IR version:  0.0.7
[09/16/2022-10:31:59] [I] [TRT] Opset version:    11
[09/16/2022-10:31:59] [I] [TRT] Producer name:    pytorch
[09/16/2022-10:31:59] [I] [TRT] Producer version: 1.10
[09/16/2022-10:31:59] [I] [TRT] Domain:
[09/16/2022-10:31:59] [I] [TRT] Model version:    0
[09/16/2022-10:31:59] [I] [TRT] Doc string:
[09/16/2022-10:31:59] [I] [TRT] ----------------------------------------------------------------
[09/16/2022-10:31:59] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/16/2022-10:31:59] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[09/16/2022-10:31:59] [I] [TRT] No importer registered for op: TRTBatchedNMS. Attempting to import as plugin.
[09/16/2022-10:31:59] [I] [TRT] Searching for plugin: TRTBatchedNMS, plugin_version: 1, plugin_namespace:
[09/16/2022-10:31:59] [I] [TRT] Successfully created plugin: TRTBatchedNMS
[09/16/2022-10:31:59] [I] Finish parsing network model
[09/16/2022-10:31:59] [I] [TRT] ---------- Layers Running on DLA ----------
[09/16/2022-10:31:59] [I] [TRT] ---------- Layers Running on GPU ----------
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Reshape_0 + Transpose_1
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Reshape_2
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_3
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_4), Mul_5)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_6
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_7), Mul_8)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_12 || Conv_9
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_13), Mul_14)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_15
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_16), Mul_17)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_18
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_10), Mul_11)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_19), Mul_20), Add_21)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_23
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_24), Mul_25)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_26
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_27), Mul_28)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_32 || Conv_29
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_33), Mul_34)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_35
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_36), Mul_37)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_38
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_39), Mul_40), Add_41)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_42
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_43), Mul_44)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_45
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_46), Mul_47), Add_48)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_49
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_50), Mul_51)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_52
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_30), Mul_31)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_53), Mul_54), Add_55)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_57
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_58), Mul_59)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_60
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_61), Mul_62)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_66 || Conv_63
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_67), Mul_68)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_69
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_70), Mul_71)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_72
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_73), Mul_74), Add_75)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_76
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_77), Mul_78)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_79
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_80), Mul_81), Add_82)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_83
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_84), Mul_85)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_86
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_64), Mul_65)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(PWN(Sigmoid_87), Mul_88), Add_89)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_91
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_92), Mul_93)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_94
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_95), Mul_96)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_97
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_98), Mul_99)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] MaxPool_102
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] MaxPool_101
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] MaxPool_100
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 622 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 623 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 624 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 625 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_104
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_105), Mul_106)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_110 || Conv_107
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_111), Mul_112)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_113
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_114), Mul_115)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_116
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_108), Mul_109)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_117), Mul_118)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_120
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_121), Mul_122)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_123
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_124), Mul_125)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Resize_127
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 660 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_132 || Conv_129
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_133), Mul_134)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_135
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_136), Mul_137)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_138
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_130), Mul_131)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_139), Mul_140)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_142
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_143), Mul_144)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_145
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_146), Mul_147)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Resize_148
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 691 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_153 || Conv_150
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_154), Mul_155)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_156
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_157), Mul_158)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_159
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_151), Mul_152)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_160), Mul_161)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_163
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_164), Mul_165)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_206
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_166
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_207), Mul_208)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_167), Mul_168)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 686 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_173 || Conv_170
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_221 || Conv_215
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_174), Mul_175)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_222), Mul_223)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_224
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_176
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_225), Mul_226)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_177), Mul_178)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_179
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_228
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_171), Mul_172)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_180), Mul_181)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_183
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_184), Mul_185)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_209
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_186
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_210), Mul_211)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_187), Mul_188)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 655 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_193 || Conv_190
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_236 || Conv_230
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_194), Mul_195)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_237), Mul_238)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_239
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_196
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_240), Mul_241)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_197), Mul_198)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_199
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_243
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_191), Mul_192)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_200), Mul_201)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_203
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_204), Mul_205)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_212
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_213), Mul_214)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_251 || Conv_245
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_252), Mul_253)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_254
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_255), Mul_256)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_258
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] {ForeignNode[Transpose_291 + Reshape_292...Unsqueeze_340]}
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_216), Mul_217)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_218
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_219), Mul_220)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_229
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_227
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_231), Mul_232)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_233
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_234), Mul_235)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_244
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_242
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_246), Mul_247)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_248
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_249), Mul_250)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_259
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Conv_257
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_285 + Reshape_286
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_287 + Reshape_288
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_289 + Reshape_290
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 930 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 938 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] 946 copy
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_297 + Reshape_298
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_299 + Reshape_300
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Transpose_301 + Reshape_302
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(Sigmoid_306)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] Unsqueeze_338
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] PWN(PWN(Sigmoid_304), Mul_339)
[09/16/2022-10:31:59] [I] [TRT] [GpuLayer] TRTBatchedNMS_341
[09/16/2022-10:32:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +208, now: CPU 759, GPU 4157 (MiB)
[09/16/2022-10:32:02] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +308, GPU +304, now: CPU 1067, GPU 4461 (MiB)
[09/16/2022-10:32:02] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[09/16/2022-10:45:54] [W] [TRT] Tactic Device request: 4364MB Available: 3789MB. Device memory is insufficient to use tactic.
[09/16/2022-10:45:54] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4364 detected for tactic 4.
[09/16/2022-10:45:54] [W] [TRT] Tactic Device request: 4364MB Available: 3788MB. Device memory is insufficient to use tactic.
[09/16/2022-10:45:54] [W] [TRT] Skipping tactic 8 due to insuficient memory on requested size of 4364 detected for tactic 60.
[09/16/2022-10:45:55] [W] [TRT] Tactic Device request: 4363MB Available: 3790MB. Device memory is insufficient to use tactic.
[09/16/2022-10:45:55] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4363 detected for tactic 4.
[09/16/2022-10:45:55] [W] [TRT] Tactic Device request: 4363MB Available: 3789MB. Device memory is insufficient to use tactic.
[09/16/2022-10:45:55] [W] [TRT] Skipping tactic 7 due to insuficient memory on requested size of 4363 detected for tactic 60.
[09/16/2022-10:46:38] [W] [TRT] Tactic Device request: 4245MB Available: 3804MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:38] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4245 detected for tactic 4.
[09/16/2022-10:46:38] [W] [TRT] Tactic Device request: 4245MB Available: 3804MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:38] [W] [TRT] Skipping tactic 8 due to insuficient memory on requested size of 4245 detected for tactic 60.
[09/16/2022-10:46:39] [W] [TRT] Tactic Device request: 4243MB Available: 3804MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:39] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4243 detected for tactic 4.
[09/16/2022-10:46:39] [W] [TRT] Tactic Device request: 4243MB Available: 3803MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:39] [W] [TRT] Skipping tactic 7 due to insuficient memory on requested size of 4243 detected for tactic 60.
[09/16/2022-10:46:55] [W] [TRT] Tactic Device request: 4202MB Available: 3802MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:55] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4202 detected for tactic 4.
[09/16/2022-10:46:55] [W] [TRT] Tactic Device request: 4202MB Available: 3802MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:55] [W] [TRT] Skipping tactic 8 due to insuficient memory on requested size of 4202 detected for tactic 60.
[09/16/2022-10:46:56] [W] [TRT] Tactic Device request: 4197MB Available: 3801MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:56] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4197 detected for tactic 4.
[09/16/2022-10:46:57] [W] [TRT] Tactic Device request: 4197MB Available: 3801MB. Device memory is insufficient to use tactic.
[09/16/2022-10:46:57] [W] [TRT] Skipping tactic 7 due to insuficient memory on requested size of 4197 detected for tactic 60.
[09/16/2022-10:47:03] [W] [TRT] Tactic Device request: 4186MB Available: 3803MB. Device memory is insufficient to use tactic.
[09/16/2022-10:47:03] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4186 detected for tactic 4.
[09/16/2022-10:47:04] [W] [TRT] Tactic Device request: 4186MB Available: 3802MB. Device memory is insufficient to use tactic.
[09/16/2022-10:47:04] [W] [TRT] Skipping tactic 9 due to insuficient memory on requested size of 4186 detected for tactic 60.
[09/16/2022-10:47:09] [W] [TRT] Tactic Device request: 4182MB Available: 3802MB. Device memory is insufficient to use tactic.
[09/16/2022-10:47:09] [W] [TRT] Skipping tactic 3 due to insuficient memory on requested size of 4182 detected for tactic 4.
[09/16/2022-10:47:10] [W] [TRT] Tactic Device request: 4182MB Available: 3802MB. Device memory is insufficient to use tactic.
[09/16/2022-10:47:10] [W] [TRT] Skipping tactic 8 due to insuficient memory on requested size of 4182 detected for tactic 60.
[09/16/2022-10:51:09] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[09/16/2022-10:51:10] [I] [TRT] Total Host Persistent Memory: 193280
[09/16/2022-10:51:10] [I] [TRT] Total Device Persistent Memory: 17983488
[09/16/2022-10:51:10] [I] [TRT] Total Scratch Memory: 19463168
[09/16/2022-10:51:10] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 24 MiB, GPU 2191 MiB
[09/16/2022-10:51:10] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 144.804ms to assign 11 blocks to 177 nodes requiring 32186368 bytes.
[09/16/2022-10:51:10] [I] [TRT] Total Activation Memory: 32186368
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1559, GPU 5381 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1559, GPU 5381 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +17, GPU +32, now: CPU 17, GPU 32 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1541, GPU 5383 (MiB)
[09/16/2022-10:51:10] [I] [TRT] Loaded engine size: 21 MiB
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 1555, GPU 5382 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1555, GPU 5382 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +17, now: CPU 0, GPU 17 (MiB)
[09/16/2022-10:51:10] [I] Engine built in 1154.83 sec.
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1423, GPU 5404 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1423, GPU 5404 (MiB)
[09/16/2022-10:51:10] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +48, now: CPU 0, GPU 65 (MiB)
[09/16/2022-10:51:10] [I] Using random values for input input
[09/16/2022-10:51:10] [I] Created input binding for input with dimensions 1x3x640x640
[09/16/2022-10:51:10] [I] Using random values for output dets
[09/16/2022-10:51:10] [I] Created output binding for dets with dimensions 1x100x5
[09/16/2022-10:51:10] [I] Using random values for output labels
[09/16/2022-10:51:10] [I] Created output binding for labels with dimensions 1x100
[09/16/2022-10:51:10] [I] Starting inference
[09/16/2022-10:51:13] [I] Warmup completed 9 queries over 200 ms
[09/16/2022-10:51:13] [I] Timing trace has 133 queries over 3.03946 s
[09/16/2022-10:51:13] [I]
[09/16/2022-10:51:13] [I] === Trace details ===
[09/16/2022-10:51:13] [I] Trace averages of 10 runs:
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.4672 ms - Host latency: 22.7111 ms (end to end 22.7184 ms, enqueue 5.77945 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.4055 ms - Host latency: 22.6495 ms (end to end 22.6571 ms, enqueue 5.30824 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.7795 ms - Host latency: 23.0231 ms (end to end 23.0305 ms, enqueue 5.49398 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.492 ms - Host latency: 22.7369 ms (end to end 22.742 ms, enqueue 5.54563 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.9473 ms - Host latency: 23.1935 ms (end to end 23.2 ms, enqueue 5.87103 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.5017 ms - Host latency: 22.7449 ms (end to end 22.753 ms, enqueue 5.54083 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.5179 ms - Host latency: 22.7629 ms (end to end 22.7712 ms, enqueue 5.78627 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.3991 ms - Host latency: 22.6424 ms (end to end 22.6485 ms, enqueue 5.65007 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.826 ms - Host latency: 23.0705 ms (end to end 23.0775 ms, enqueue 5.47402 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.8326 ms - Host latency: 23.0772 ms (end to end 23.0843 ms, enqueue 5.81785 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.47 ms - Host latency: 22.713 ms (end to end 22.7192 ms, enqueue 5.63416 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.5972 ms - Host latency: 22.8415 ms (end to end 22.8483 ms, enqueue 5.98044 ms)
[09/16/2022-10:51:13] [I] Average on 10 runs - GPU latency: 22.6226 ms - Host latency: 22.8667 ms (end to end 22.8724 ms, enqueue 5.62722 ms)
[09/16/2022-10:51:13] [I]
[09/16/2022-10:51:13] [I] === Performance summary ===
[09/16/2022-10:51:13] [I] Throughput: 43.7578 qps
[09/16/2022-10:51:13] [I] Latency: min = 22.4496 ms, max = 25.3685 ms, mean = 22.8453 ms, median = 22.649 ms, percentile(99%) = 25.0625 ms
[09/16/2022-10:51:13] [I] End-to-End Host Latency: min = 22.4545 ms, max = 25.3785 ms, mean = 22.8522 ms, median = 22.6523 ms, percentile(99%) = 25.0696 ms
[09/16/2022-10:51:13] [I] Enqueue Time: min = 4.58716 ms, max = 8.62158 ms, mean = 5.66658 ms, median = 5.71181 ms, percentile(99%) = 7.85083 ms
[09/16/2022-10:51:13] [I] H2D Latency: min = 0.233643 ms, max = 0.246338 ms, mean = 0.240283 ms, median = 0.240234 ms, percentile(99%) = 0.246094 ms
[09/16/2022-10:51:13] [I] GPU Compute Time: min = 22.2075 ms, max = 25.1259 ms, mean = 22.6011 ms, median = 22.4021 ms, percentile(99%) = 24.825 ms
[09/16/2022-10:51:13] [I] D2H Latency: min = 0.00341797 ms, max = 0.00463867 ms, mean = 0.00392048 ms, median = 0.00390625 ms, percentile(99%) = 0.00463867 ms
[09/16/2022-10:51:13] [I] Total Host Walltime: 3.03946 s
[09/16/2022-10:51:13] [I] Total GPU Compute Time: 3.00595 s
[09/16/2022-10:51:13] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/16/2022-10:51:13] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEngine=end2end.engine

tongda avatar Sep 16 '22 14:09 tongda

I have the same problem, have you solved it?

tx19990922 avatar Sep 19 '22 02:09 tx19990922

From the comparison of the two logs, I'd say it may be raised by insufficient mem. Would you try change the input resolution to 320x320 or smaller and run it again?

AllentDan avatar Sep 19 '22 02:09 AllentDan

For me, it produced the same error as the following。 Error Code 10: Internal Error (Could not find any implementation for node MaxPool_102.)

tx19990922 avatar Sep 19 '22 02:09 tx19990922

From the comparison of the two logs, I'd say it may be raised by insufficient mem. Would you try change the input resolution to 320x320 or smaller and run it again?

I changed mmdeploy config to follow:

_base_ = ['../_base_/base_static.py', '../../_base_/backends/tensorrt.py']

onnx_config = dict(input_shape=(320, 320))

backend_config = dict(
    common_config=dict(max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 320, 320],
                    opt_shape=[1, 3, 320, 320],
                    max_shape=[1, 3, 320, 320])))
    ])

run deploy.py with the same det model config (yolox_s_8x8_300e_coco.py), it throw same exception.

tongda avatar Sep 19 '22 07:09 tongda

I have the same problem, have you solved it?

I use the engine generated by trtexec for now.

tongda avatar Sep 19 '22 07:09 tongda

Can you elaborate on that, because I haven't solved this problem yet, thanks.

tx19990922 avatar Sep 20 '22 01:09 tx19990922

Hi, would you try adding

config.set_tactic_sources(1<<int(trt.TacticSource.CUBLAS))

after the following line? https://github.com/open-mmlab/mmdeploy/blob/6f5161b2fad5006c4b101bfd2c3ea9486730fa00/mmdeploy/backend/tensorrt/utils.py#L172

Just a try as I don't have any clue except the bug of TRT for CUDA 10.2.

AllentDan avatar Sep 20 '22 05:09 AllentDan

Can you elaborate on that, because I haven't solved this problem yet, thanks.

generate engine file using command like this:

/usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEng
ine=end2end.engine

then you can use mmdeploy scritps to load this engine file, like test.py.

tongda avatar Sep 20 '22 06:09 tongda

I am afraid this is not a valid modification, cause the code after will call set_tactic_sources at this line:

https://github.com/open-mmlab/mmdeploy/blob/6f5161b2fad5006c4b101bfd2c3ea9486730fa00/mmdeploy/backend/tensorrt/utils.py#L177-L181

Hi, would you try adding

config.set_tactic_sources(1<<int(trt.TacticSource.CUBLAS))

after the following line?

https://github.com/open-mmlab/mmdeploy/blob/6f5161b2fad5006c4b101bfd2c3ea9486730fa00/mmdeploy/backend/tensorrt/utils.py#L172

Just a try as I don't have any clue except the bug of TRT for CUDA 10.2.

tongda avatar Sep 20 '22 07:09 tongda

I tried to comment L176-182,and add config.set_tactic_sources(1<<int(trt.TacticSource.CUBLAS)). got the same exception.

tongda avatar Sep 20 '22 07:09 tongda

Can you elaborate on that, because I haven't solved this problem yet, thanks.

generate engine file using command like this:

/usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEng
ine=end2end.engine

then you can use mmdeploy scritps to load this engine file, like test.py.

Thanks a lot , it works ,this question really bothered me for a long time.

tx19990922 avatar Sep 20 '22 08:09 tx19990922

@grimoire Hi, do you have any clue about why our onnx2trt failed while trtexec worked?

AllentDan avatar Sep 20 '22 08:09 AllentDan

Errr, It is hard to say. Have you try other cuda+TensorRT combination?

grimoire avatar Sep 20 '22 09:09 grimoire

I had the same problem. But when i run this command , i had failed

image

Can you elaborate on that, because I haven't solved this problem yet, thanks.

generate engine file using command like this:

/usr/src/tensorrt/bin/trtexec --onnx=./end2end.onnx --plugins=../../mmdeploy/lib/libmmdeploy_tensorrt_ops.so --workspace=6000 --fp16 --saveEng
ine=end2end.engine

then you can use mmdeploy scritps to load this engine file, like test.py.

lijoe123 avatar Sep 22 '22 11:09 lijoe123