mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

pytorch:0.048s tensorrt:0.08s tensorrt f16:0.08s onxruntime-gpu:0.09s

**1050ti的环境：

2022-06-07 14:38:31,810 - mmdeploy - INFO -

2022-06-07 14:38:31,810 - mmdeploy - INFO - Environmental information 2022-06-07 14:38:33,111 - mmdeploy - INFO - sys.platform: linux 2022-06-07 14:38:33,111 - mmdeploy - INFO - Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] 2022-06-07 14:38:33,111 - mmdeploy - INFO - CUDA available: True 2022-06-07 14:38:33,111 - mmdeploy - INFO - GPU 0: NVIDIA GeForce GTX 1050 Ti 2022-06-07 14:38:33,111 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda 2022-06-07 14:38:33,111 - mmdeploy - INFO - NVCC: Build cuda_11.1.TC455_06.29190527_0 2022-06-07 14:38:33,111 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 2022-06-07 14:38:33,111 - mmdeploy - INFO - PyTorch: 1.9.0+cu111 2022-06-07 14:38:33,111 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

2022-06-07 14:38:33,111 - mmdeploy - INFO - TorchVision: 0.10.0+cu111 2022-06-07 14:38:33,111 - mmdeploy - INFO - OpenCV: 4.5.5 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV: 1.4.7 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV Compiler: GCC 7.3 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV CUDA Compiler: 11.1 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMDeployment: 0.1.0+668fb16 2022-06-07 14:38:33,111 - mmdeploy - INFO -

2022-06-07 14:38:33,111 - mmdeploy - INFO - Backend information 2022-06-07 14:38:34,351 - mmdeploy - INFO - onnxruntime: 1.8.0 ops_is_avaliable : True 2022-06-07 14:38:34,356 - mmdeploy - INFO - tensorrt: 8.2.5.1 ops_is_avaliable : True 2022-06-07 14:38:34,357 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2022-06-07 14:38:34,357 - mmdeploy - INFO - pplnn_is_avaliable: False 2022-06-07 14:38:34,369 - mmdeploy - INFO - openvino_is_avaliable: False 2022-06-07 14:38:34,369 - mmdeploy - INFO -

2022-06-07 14:38:34,369 - mmdeploy - INFO - Codebase information 2022-06-07 14:38:34,370 - mmdeploy - INFO - mmcls: 0.23.0 2022-06-07 14:38:34,441 - mmdeploy - INFO - mmdet: 2.20.0 2022-06-07 14:38:34,441 - mmdeploy - INFO - mmedit: None 2022-06-07 14:38:34,474 - mmdeploy - INFO - mmocr: 0.4.1 2022-06-07 14:38:34,581 - mmdeploy - INFO - mmseg: 0.21.1

以下是推理代码：

from mmdeploy.apis.utils import build_task_processor model_cfg_path = 'mmsegmentation/work_dirs/pspnet_r50-d8_352x352_20k_voc12_liquidoment.py' deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_tensorrt_static-352x352.py' backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.engine'

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

device = 'cuda:0' deploy_cfg, model_cfg = load_config(deploy_cfg_path, model_cfg_path) backend = get_backend(deploy_cfg) task_processor = build_task_processor(model_cfg, deploy_cfg, device) model = task_processor.init_backend_model([backend_files]) input_shape = get_input_shape(deploy_cfg) cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if ret: img_origin = cv2.imread("/media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/demo/liquidomet_22_0.jpg") cv2.imshow("img_detect", img_origin) cv2.waitKey(1) img_h, img_w = img_origin.shape[:2] outputs, img_info = predictor.inference(img_origin) # print(outputs) if outputs[0] is None: continue ratio = img_info["ratio"] outputs = outputs[0][:, 0:4] / ratio

    img_2 = img_origin.copy()
    detect_result = {}
    for i, pos in enumerate(outputs[:, :4]):
        pos = np.int16(pos.cpu())
        x1, y1, x2, y2 = pos[0], pos[1], pos[2], pos[3]
        x1, x2, y1, y2 = max(0, x1), min(img_w, x2), max(0, y1), min(img_h, y2)
        w, h = np.int32(x2 - x1), np.int32(y2 - y1)
        detect_box_area = w * h

        arclength = 2 * w + 2 * h
        print("arclength", arclength)
        if w < 30 or h < 30:
            continue
        cv2.rectangle(img_2, pt1=(x1, y1), pt2=(x2, y2), color=(0, 0, 255), thickness=2)
        img_crop = img_origin.copy()[y1:y2, x1:x2]
        detect_result[f"i"] = {"img": img_crop, "area": detect_box_area}

    # mosac = np.hstack([img_origin, img_2])
    cv2.imshow("img_detect", img_2)
    cv2.waitKey(1)
    if len(detect_result) == 0:
        continue

    for box in detect_result.values():
        print("开始分割..........................................................")
        img = box["img"]
        start_time = time.time()
        model_inputs, _ = task_processor.create_input(img, input_shape)
        result = task_processor.run_inference(model, model_inputs)
        print(result)
        print("时间：", time.time() - start_time)

代码流程是先使用yolox检测抠图，然后对抠图进行分割，时间是单张图片的时间，每次循环都是同一张图片，测试时间是循环中后期的时间，不包括前期加载gpu的时间。

以下是tensorrt转换过程：

2022-06-07 15:00:51,993 - mmdeploy - INFO - torch2onnx start. load checkpoint from local path: mmlab/mmsegmentation/work_dirs/iter_10000_liquidoment_20220325.pth /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/segmentors/base.py:39: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! img_shape = [int(val) for val in img_shape] /home/ymw/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/decode_heads/psp_head.py:29: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = [int(val) for val in size] /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/segmentors/encoder_decoder.py:39: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! shape = [int(_) for _ in shape] 2022-06-07 15:01:04,728 - mmdeploy - INFO - torch2onnx success. 2022-06-07 15:01:05,469 - mmdeploy - INFO - onnx2tensorrt of work_dir/end2end.onnx start. 2022-06-07 15:01:07,339 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so [06/07/2022-15:01:07] [TRT] [I] [MemUsageChange] Init CUDA: CPU +192, GPU +0, now: CPU 265, GPU 1467 (MiB) [06/07/2022-15:01:08] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 265 MiB, GPU 1467 MiB [06/07/2022-15:01:08] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 328 MiB, GPU 1467 MiB [06/07/2022-15:01:08] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [06/07/2022-15:01:08] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [06/07/2022-15:01:09] [TRT] [W] Half2 support requested on hardware without native FP16 support, performance will be negatively affected. [06/07/2022-15:01:10] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:10] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +61, GPU +76, now: CPU 2493, GPU 2215 (MiB) [06/07/2022-15:01:10] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +111, GPU +48, now: CPU 2604, GPU 2263 (MiB) [06/07/2022-15:01:10] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [06/07/2022-15:01:12] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [06/07/2022-15:01:34] [TRT] [I] Detected 1 inputs and 1 output network tensors. [06/07/2022-15:01:34] [TRT] [I] Total Host Persistent Memory: 121072 [06/07/2022-15:01:34] [TRT] [I] Total Device Persistent Memory: 313039872 [06/07/2022-15:01:34] [TRT] [I] Total Scratch Memory: 3964928 [06/07/2022-15:01:34] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 72 MiB, GPU 541 MiB [06/07/2022-15:01:34] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.82345ms to assign 7 blocks to 86 nodes requiring 67405824 bytes. [06/07/2022-15:01:34] [TRT] [I] Total Activation Memory: 67405824 [06/07/2022-15:01:34] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2906, GPU 2705 (MiB) [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2906, GPU 2713 (MiB) [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +315, now: CPU 0, GPU 315 (MiB) 2022-06-07 15:01:36,831 - mmdeploy - INFO - onnx2tensorrt of work_dir/end2end.onnx success. 2022-06-07 15:01:36,831 - mmdeploy - INFO - visualize tensorrt model start. 2022-06-07 15:01:41,730 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so 2022-06-07 15:01:41,731 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so [06/07/2022-15:01:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 2022-06-07 15:01:49,378 - mmdeploy - INFO - visualize tensorrt model success. 2022-06-07 15:01:49,378 - mmdeploy - INFO - visualize pytorch model start. load checkpoint from local path: mmlab/mmsegmentation/work_dirs/iter_10000_liquidoment_20220325.pth /home/ymw/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 2022-06-07 15:02:00,170 - mmdeploy - INFO - visualize pytorch model success. 2022-06-07 15:02:00,170 - mmdeploy - INFO - All process success.

Jun 07 '22 06:06 ymw123

@ymw123 Hi,

please refer to here for how we test latency of backend models. The latency benchmark could be viewed as reference.
Could you share how you test the speed of PyTorch model?

BTW, please use English for the benefit of the whole community.

Jun 08 '22 02:06 RunningLeon

@ymw123 Hi,

1. please refer to [here](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/tutorials/how_to_measure_performance_of_models.md) for how we test latency of backend models. The latency [benchmark](https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/benchmark.md) could be viewed as reference.

2. Could you share how you test the speed of PyTorch model?

BTW, please use English for the benefit of the whole community.

The link you sent can't be opened

Jun 15 '22 10:06 ymw123

@ymw123 Hi, sorry for the change.

link for how to profile model: https://mmdeploy.readthedocs.io/zh_CN/latest/02-how-to-run/profile_model.html
link for benchmark : https://mmdeploy.readthedocs.io/zh_CN/latest/03-benchmark/benchmark.html

Jun 15 '22 14:06 RunningLeon

@RunningLeon

python tools/test.py
configs/mmseg/segmentation_tensorrt_static-352x352.py
mmlab/mmsegmentation/work_dirs/pspnet_r50-d8_352x352_20k_voc12_liquidoment.py
--model work_dir/end2end_liquidoment.engine
--speed-test
--device cuda:0

Using this command, can't the width and height of the dataset be automatically scaled? Because of the following error:

Jun 22 '22 04:06 ymw123

@RunningLeon

Is there any problem using the code below to test the speed of a single image? The speed does not include the time to load the model and the cuda cache.

Jun 22 '22 04:06 ymw123

Hi, it depends on the pipeline of model config. Clearly, the input image is not preprocessed to shape 352x352. You may need to set Resize=(352,352), keep_ratio=False in the config.

Jun 22 '22 05:06 RunningLeon

@RunningLeon

Is there any problem using the code below to test the speed of a single image? The speed does not include the time to load the model and the cuda cache.

Hi, this way, the pre- and post-processing will be included, which may affect latency computing greatly.

Jun 22 '22 05:06 RunningLeon

The test of the pytorch model also includes the same processing steps. The commented out code result = inference_segmentor(model, img) outputs the same result as model_inputs, _ = task_processor.create_input(img, input_shape); result = task_processor.run_inference(model, model_inputs), both with post-processing of. This is how I compare the speed.

Jun 22 '22 09:06 ymw123

@ymw123 Hi, you could exclude create_input step. For pytorch model, did you also test on the jetson?

Jun 23 '22 10:06 RunningLeon

mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

pytorch:0.048s tensorrt:0.08s tensorrt f16:0.08s onxruntime-gpu:0.09s

**1050ti的环境：

2022-06-07 14:38:31,810 - mmdeploy - INFO -

2022-06-07 14:38:31,810 - mmdeploy - INFO - Environmental information 2022-06-07 14:38:33,111 - mmdeploy - INFO - sys.platform: linux 2022-06-07 14:38:33,111 - mmdeploy - INFO - Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] 2022-06-07 14:38:33,111 - mmdeploy - INFO - CUDA available: True 2022-06-07 14:38:33,111 - mmdeploy - INFO - GPU 0: NVIDIA GeForce GTX 1050 Ti 2022-06-07 14:38:33,111 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda 2022-06-07 14:38:33,111 - mmdeploy - INFO - NVCC: Build cuda_11.1.TC455_06.29190527_0 2022-06-07 14:38:33,111 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 2022-06-07 14:38:33,111 - mmdeploy - INFO - PyTorch: 1.9.0+cu111 2022-06-07 14:38:33,111 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
* GCC 7.3

* C++ Version: 201402

* Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

* Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)

* OpenMP 201511 (a.k.a. OpenMP 4.5)

* NNPACK is enabled

* CPU capability usage: AVX2

* CUDA Runtime 11.1

* NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86

* CuDNN 8.0.5

* Magma 2.5.2

* Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
2022-06-07 14:38:33,111 - mmdeploy - INFO - TorchVision: 0.10.0+cu111 2022-06-07 14:38:33,111 - mmdeploy - INFO - OpenCV: 4.5.5 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV: 1.4.7 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV Compiler: GCC 7.3 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMCV CUDA Compiler: 11.1 2022-06-07 14:38:33,111 - mmdeploy - INFO - MMDeployment: 0.1.0+668fb16 2022-06-07 14:38:33,111 - mmdeploy - INFO -

2022-06-07 14:38:33,111 - mmdeploy - INFO - Backend information 2022-06-07 14:38:34,351 - mmdeploy - INFO - onnxruntime: 1.8.0 ops_is_avaliable : True 2022-06-07 14:38:34,356 - mmdeploy - INFO - tensorrt: 8.2.5.1 ops_is_avaliable : True 2022-06-07 14:38:34,357 - mmdeploy - INFO - ncnn: None ops_is_avaliable : False 2022-06-07 14:38:34,357 - mmdeploy - INFO - pplnn_is_avaliable: False 2022-06-07 14:38:34,369 - mmdeploy - INFO - openvino_is_avaliable: False 2022-06-07 14:38:34,369 - mmdeploy - INFO -

2022-06-07 14:38:34,369 - mmdeploy - INFO - Codebase information 2022-06-07 14:38:34,370 - mmdeploy - INFO - mmcls: 0.23.0 2022-06-07 14:38:34,441 - mmdeploy - INFO - mmdet: 2.20.0 2022-06-07 14:38:34,441 - mmdeploy - INFO - mmedit: None 2022-06-07 14:38:34,474 - mmdeploy - INFO - mmocr: 0.4.1 2022-06-07 14:38:34,581 - mmdeploy - INFO - mmseg: 0.21.1

以下是推理代码：

from mmdeploy.apis.utils import build_task_processor model_cfg_path = 'mmsegmentation/work_dirs/pspnet_r50-d8_352x352_20k_voc12_liquidoment.py' deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_tensorrt_static-352x352.py' backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.engine'

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

device = 'cuda:0' deploy_cfg, model_cfg = load_config(deploy_cfg_path, model_cfg_path) backend = get_backend(deploy_cfg) task_processor = build_task_processor(model_cfg, deploy_cfg, device) model = task_processor.init_backend_model([backend_files]) input_shape = get_input_shape(deploy_cfg) cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if ret: img_origin = cv2.imread("/media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/demo/liquidomet_22_0.jpg") cv2.imshow("img_detect", img_origin) cv2.waitKey(1) img_h, img_w = img_origin.shape[:2] outputs, img_info = predictor.inference(img_origin) # print(outputs) if outputs[0] is None: continue ratio = img_info["ratio"] outputs = outputs[0][:, 0:4] / ratio
    img_2 = img_origin.copy()
    detect_result = {}
    for i, pos in enumerate(outputs[:, :4]):
        pos = np.int16(pos.cpu())
        x1, y1, x2, y2 = pos[0], pos[1], pos[2], pos[3]
        x1, x2, y1, y2 = max(0, x1), min(img_w, x2), max(0, y1), min(img_h, y2)
        w, h = np.int32(x2 - x1), np.int32(y2 - y1)
        detect_box_area = w * h

        arclength = 2 * w + 2 * h
        print("arclength", arclength)
        if w < 30 or h < 30:
            continue
        cv2.rectangle(img_2, pt1=(x1, y1), pt2=(x2, y2), color=(0, 0, 255), thickness=2)
        img_crop = img_origin.copy()[y1:y2, x1:x2]
        detect_result[f"i"] = {"img": img_crop, "area": detect_box_area}

    # mosac = np.hstack([img_origin, img_2])
    cv2.imshow("img_detect", img_2)
    cv2.waitKey(1)
    if len(detect_result) == 0:
        continue

    for box in detect_result.values():
        print("开始分割..........................................................")
        img = box["img"]
        start_time = time.time()
        model_inputs, _ = task_processor.create_input(img, input_shape)
        result = task_processor.run_inference(model, model_inputs)
        print(result)
        print("时间：", time.time() - start_time)
代码流程是先使用yolox检测抠图，然后对抠图进行分割，时间是单张图片的时间，每次循环都是同一张图片，测试时间是循环中后期的时间，不包括前期加载gpu的时间。

以下是tensorrt转换过程：

2022-06-07 15:00:51,993 - mmdeploy - INFO - torch2onnx start. load checkpoint from local path: mmlab/mmsegmentation/work_dirs/iter_10000_liquidoment_20220325.pth /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/segmentors/base.py:39: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! img_shape = [int(val) for val in img_shape] /home/ymw/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/decode_heads/psp_head.py:29: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = [int(val) for val in size] /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/mmdeploy/codebase/mmseg/models/segmentors/encoder_decoder.py:39: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! shape = [int(_) for _ in shape] 2022-06-07 15:01:04,728 - mmdeploy - INFO - torch2onnx success. 2022-06-07 15:01:05,469 - mmdeploy - INFO - onnx2tensorrt of work_dir/end2end.onnx start. 2022-06-07 15:01:07,339 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so [06/07/2022-15:01:07] [TRT] [I] [MemUsageChange] Init CUDA: CPU +192, GPU +0, now: CPU 265, GPU 1467 (MiB) [06/07/2022-15:01:08] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 265 MiB, GPU 1467 MiB [06/07/2022-15:01:08] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 328 MiB, GPU 1467 MiB [06/07/2022-15:01:08] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [06/07/2022-15:01:08] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [06/07/2022-15:01:09] [TRT] [W] Half2 support requested on hardware without native FP16 support, performance will be negatively affected. [06/07/2022-15:01:10] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:10] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +61, GPU +76, now: CPU 2493, GPU 2215 (MiB) [06/07/2022-15:01:10] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +111, GPU +48, now: CPU 2604, GPU 2263 (MiB) [06/07/2022-15:01:10] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [06/07/2022-15:01:12] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [06/07/2022-15:01:34] [TRT] [I] Detected 1 inputs and 1 output network tensors. [06/07/2022-15:01:34] [TRT] [I] Total Host Persistent Memory: 121072 [06/07/2022-15:01:34] [TRT] [I] Total Device Persistent Memory: 313039872 [06/07/2022-15:01:34] [TRT] [I] Total Scratch Memory: 3964928 [06/07/2022-15:01:34] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 72 MiB, GPU 541 MiB [06/07/2022-15:01:34] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.82345ms to assign 7 blocks to 86 nodes requiring 67405824 bytes. [06/07/2022-15:01:34] [TRT] [I] Total Activation Memory: 67405824 [06/07/2022-15:01:34] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2906, GPU 2705 (MiB) [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2906, GPU 2713 (MiB) [06/07/2022-15:01:34] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +315, now: CPU 0, GPU 315 (MiB) 2022-06-07 15:01:36,831 - mmdeploy - INFO - onnx2tensorrt of work_dir/end2end.onnx success. 2022-06-07 15:01:36,831 - mmdeploy - INFO - visualize tensorrt model start. 2022-06-07 15:01:41,730 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so 2022-06-07 15:01:41,731 - mmdeploy - INFO - Successfully loaded tensorrt plugins from /media/ymw/DATA2/ubuntu20.04_backup_20220218/open-mmlab/MMDeploy/build/lib/libmmdeploy_tensorrt_ops.so [06/07/2022-15:01:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 [06/07/2022-15:01:42] [TRT] [W] TensorRT was linked against cuBLAS/cuBLASLt 11.6.5 but loaded cuBLAS/cuBLASLt 11.3.0 2022-06-07 15:01:49,378 - mmdeploy - INFO - visualize tensorrt model success. 2022-06-07 15:01:49,378 - mmdeploy - INFO - visualize pytorch model start. load checkpoint from local path: mmlab/mmsegmentation/work_dirs/iter_10000_liquidoment_20220325.pth /home/ymw/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 2022-06-07 15:02:00,170 - mmdeploy - INFO - visualize pytorch model success. 2022-06-07 15:02:00,170 - mmdeploy - INFO - All process success.

Hello, may I ask, have you solved this problem, I also encountered the same problem.

Jul 26 '22 06:07 jiaqizhang123-stack

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Apr 02 '23 01:04 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

May 13 '23 01:05 github-actions[bot]

mmdeploy
mmdeploy copied to clipboard

Unreasonable inference latency of PSPNet with tensorrt, onnxruntime-gpu, pytorch on 1050ti and jetson agx xavier

mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

**1050ti的环境：

以下是推理代码：

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

以下是tensorrt转换过程：

mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

**1050ti的环境：

以下是推理代码：

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

以下是tensorrt转换过程：

mmdeploy mmdeploy copied to clipboard

Unreasonable inference latency of PSPNet with tensorrt, onnxruntime-gpu, pytorch on 1050ti and jetson agx xavier

mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

**1050ti的环境：

以下是推理代码：

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

以下是tensorrt转换过程：

mmsegment的pspnet训练后，转化为tensorrt模型与onnx模型，推理速度慢了一倍，推理结果是正确的，以下是推理时间：

**1050ti的环境：

以下是推理代码：

deploy_cfg_path = '/media/ymw/DATA2/instrument_recognition/MMDeploy/configs/mmseg/segmentation_onnxruntime_static.py'

backend_files = '/media/ymw/DATA2/instrument_recognition/MMDeploy/work_dir/end2end_liquidoment.onnx'

以下是tensorrt转换过程：

mmdeploy
mmdeploy copied to clipboard