模型转换为tensorrt后，预测结果与pytorch不一致

Open Bonheur96 opened this issue 1 year ago • 0 comments

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).

Environment

OrderedDict([('sys.platform', 'win32'), ('Python', '3.8.19 (default, Mar 20 2024, 19:55:45) [MSC v.1916 64 bit (AMD64)]'), ('CUDA available', True), ('MUSA available', False), ('numpy_r andom_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3080'), ('CUDA_HOME', 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6'), ('NVCC', 'Cuda compilation tools, release 11.6, V11.6.55'), ('MSVC', '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.37.32705 版'), ('GCC', 'n/a'), ('PyTorch', '2.3.1+cu118'), ('PyTorch compiling details', 'PyTorch built with:\n

C++ Version: 201703\n - MSVC 192930154\n - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DN N v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)\n - OpenMP 2019\n - LAPACK is enabled (usually provided by MKL)\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch= compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compu te_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.5.4\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_wor k/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_N OCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PER F_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, U SE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n'), ('TorchVision', '0.18.1+cu118'), ('OpenCV', '4.10.0'), ('MMEngine', '0.10.4'), ('MMPose', '1.3.1+')]) mmcv 2.1.0 mmdeploy 1.3.1 mmdeploy-runtime-gpu 1.3.1 mmdet 3.3.0 mmengine 0.10.4 mmpose 1.3.1 d:\binocular_camera\mmpose-main mmpretrain 1.2.0 d:\binocular_camera\mmpose-main\mmpretrain

Reproduces the problem - code sample

img = r'C:\Users\HYGEA\Desktop\needle-label\240513_label\240513\HD2K_SN28988284_16-19-01_left_5persecond/HD2K_SN28988284_16-19-01_left_5persecond_000000.jpg' work_dir = 'work_dir/trt/hrnet' save_file = 'end2end.onnx' deploy_cfg = r'D:\binocular_camera\mmpose-main\mmdeploy\configs\mmpose\pose-detection_tensorrt-fp16_static-256x256.py' model_cfg = r'D:\binocular_camera\mmpose-main\configs\needle_10_keypoint\td-hm_hrnetv2-w18_dark-8xb64-60e_wflw-256x256.py' model_checkpoint = r'D:\binocular_camera\mmpose-main\work_dirs\071624\best_coco_AP_epoch_98.pth' device = 'cuda'

1. convert model to IR(onnx)

torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)

2. convert IR to tensorrt

onnx_model = os.path.join(work_dir, save_file) save_file = 'end2end.engine' model_id = 0 device = 'cuda' onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device)

3. extract pipeline info for sdk use (dump-info)

export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)

image_path=r'D:\binocular_camera\mmpose-main\dataset\needle_10\images\val\HD2K_SN28988284_10-04-05_left_2persecond_000075.jpg' img = cv2.imread(image_path)

model_path=r'D:/binocular_camera/mmpose-main/work_dir\trt\hrnet' detector = PoseDetector( model_path=model_path, device_name='cuda', device_id=0)

bbox=[ 939.4366197183099, 408.33333333333337, 906.2300469483569, 213.37089201877927] if bbox is None: result = detector(img) else: # converter (x, y, w, h) -> (left, top, right, bottom) start_time = time.time()

print(bbox)
bbox = np.array(bbox, dtype=int)
bbox[2:] += bbox[:2]
result = detector(img, bbox)
end_time = time.time()

_, point_num, _ = result.shape points = result[:, :, :2].reshape(point_num, 2) for [x, y] in points.astype(int): cv2.circle(img, (x, y), 1, (0, 255, 0), 2)

cv2.imwrite('output_pose.png', img)

Reproduces the problem - command or script

。

Reproduces the problem - error message

loading mmdeploy_trt_net.dll ... loading mmdeploy_ort_net.dll ... 07/16 15:47:25 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. 07/16 15:47:25 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized. Loads checkpoint by local backend from path: D:\binocular_camera\mmpose-main\work_dirs\071624\best_coco_AP_epoch_98.pth 07/16 15:47:26 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 07/16 15:47:26 - mmengine - INFO - Export PyTorch model to ONNX: work_dir/trt/hrnet\end2end.onnx. 07/16 15:47:26 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied 07/16 15:47:26 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied D:\binocular_camera\mmpose-main\mmpose\models\utils\ops.py:52: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tuple(int(x) for x in size) 07/16 15:47:41 - mmengine - INFO - Successfully loaded tensorrt plugins from C:\Users\HYGEA\anaconda3\envs\mmdeploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll [07/16/2024-15:47:41] [TRT] [I] [MemUsageChange] Init CUDA: CPU +409, GPU +0, now: CPU 19242, GPU 1382 (MiB) [07/16/2024-15:47:43] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +389, GPU +102, now: CPU 19799, GPU 1484 (MiB) [07/16/2024-15:47:43] [TRT] [I] ---------------------------------------------------------------- [07/16/2024-15:47:43] [TRT] [I] Input filename: work_dir/trt/hrnet\end2end.onnx [07/16/2024-15:47:43] [TRT] [I] ONNX IR version: 0.0.6 [07/16/2024-15:47:43] [TRT] [I] Opset version: 11 [07/16/2024-15:47:43] [TRT] [I] Producer name: pytorch [07/16/2024-15:47:43] [TRT] [I] Producer version: 2.3.1 [07/16/2024-15:47:43] [TRT] [I] Domain:
[07/16/2024-15:47:43] [TRT] [I] Model version: 0 [07/16/2024-15:47:43] [TRT] [I] Doc string:
[07/16/2024-15:47:43] [TRT] [I] ---------------------------------------------------------------- [07/16/2024-15:47:43] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [07/16/2024-15:47:46] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +5, GPU +10, now: CPU 19731, GPU 1494 (MiB) [07/16/2024-15:47:46] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +11, GPU +8, now: CPU 19742, GPU 1502 (MiB) [07/16/2024-15:47:46] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:47:59] [TRT] [W] Weights [name=/backbone/conv1/Conv + /backbone/relu/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:47:59] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:00] [TRT] [W] Weights [name=/backbone/conv2/Conv + /backbone/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:00] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:00] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:02] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected. [07/16/2024-15:48:02] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights. [07/16/2024-15:48:02] [TRT] [W] Weights [name=/backbone/layer1/layer1.0/conv2/Conv + /backbone/layer1/layer1.0/relu_1/Relu.weight] had the following issues when converted to FP16: [07/16/2024-15:48:02] [TRT] [W] - Subnormal FP16 values detected.

Additional information

在linux平台训练和测试后，将模型在windows平台转换为tensorrt，是否有影响

Jul 16 '24 08:07 Bonheur96

mmpose mmpose copied to clipboard

模型转换为tensorrt后，预测结果与pytorch不一致

Prerequisite

Environment

Reproduces the problem - code sample

1. convert model to IR(onnx)

2. convert IR to tensorrt

3. extract pipeline info for sdk use (dump-info)

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

mmpose
mmpose copied to clipboard