PaddleOCR
PaddleOCR copied to clipboard
paddleocr windows下C++部署,在使用TensorRT加速时报错
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
- 系统环境/System Environment:windows10/GTX 1060
- 版本号/Version:Paddle: PaddleOCR:v2.5.0 问题相关组件/Related components:
- 问题描述 下载了windows预测库里的avx_mkl_cuda11.0_cudnn8.0_avx_mkl-trt7.2.1.6 ,按照教程在Windows下基于预测库进行C++预测推理部署出现问题,在 --use_tensorrt=true --use_gpu=true添加时报错
- 运行指令/Command Code: D:\projects\cpp\PaddleOCR\deploy\cpp_infer\build\Release\ppocr.exe system --det_model_dir=D:\projects\cpp\ch_PP-OCRv3_det_infer --rec_model_dir=D:\projects\cpp\ch_PP-OCRv3_rec_infer --use_tensorrt=true --use_gpu=true --image_dir=D:\projects\cpp\PaddleOCR\doc\imgs\11.jpg
- 完整报错/Complete Error Message: total images num: 1 WARNING: Logging before InitGoogleLogging() is written to STDERR I0803 16:02:48.757169 23200 analysis_predictor.cc:881] TensorRT subgraph engine is enabled e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m I0803 16:02:49.023375 23200 fuse_pass_base.cc:57] --- detected 8 subgraphs e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m e[32m--- Running IR pass [delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [add_support_int8_pass]e[0m I0803 16:02:49.214921 23200 fuse_pass_base.cc:57] --- detected 237 subgraphs e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_skip_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I0803 16:02:49.435047 23200 fuse_pass_base.cc:57] --- detected 33 subgraphs e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m e[32m--- Running IR pass [trt_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_v2_to_mul_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m I0803 16:02:49.498874 23200 fuse_pass_base.cc:57] --- detected 49 subgraphs e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m I0803 16:02:49.561825 23200 tensorrt_subgraph_pass.cc:141] --- detect a sub-graph with 187 nodes I0803 16:02:49.886066 23200 tensorrt_subgraph_pass.cc:403] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0803 16:02:50.663269 23200 engine.cc:203] Run Paddle-TRT Dynamic Shape mode. I0803 16:03:43.932397 23200 engine.cc:424] Inspector needs TensorRT version 8.2 and after. e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m I0803 16:03:43.954339 23200 ir_params_sync_among_devices_pass.cc:100] Sync params from CPU to GPU e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0803 16:03:44.204703 23200 analysis_predictor.cc:1035] ======= optimize end ======= I0803 16:03:44.204703 23200 naive_executor.cc:102] --- skip [feed], feed -> x I0803 16:03:44.205667 23200 naive_executor.cc:102] --- skip [sigmoid_0.tmp_0], fetch -> fetch In PP-OCRv3, default rec_img_h is 48,if you use other model, you should set the param rec_img_h=32 I0803 16:03:44.311573 23200 analysis_predictor.cc:881] TensorRT subgraph engine is enabled e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m I0803 16:03:44.496981 23200 fuse_pass_base.cc:57] --- detected 2 subgraphs e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m e[32m--- Running IR pass [delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [add_support_int8_pass]e[0m I0803 16:03:44.572777 23200 fuse_pass_base.cc:57] --- detected 184 subgraphs e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [skip_layernorm_fuse_pass]e[0m I0803 16:03:44.640208 23200 fuse_pass_base.cc:57] --- detected 1 subgraphs e[32m--- Running IR pass [preln_skip_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I0803 16:03:44.700013 23200 fuse_pass_base.cc:57] --- detected 19 subgraphs e[32m--- Running IR pass [unsqueeze2_eltwise_fuse_pass]e[0m e[32m--- Running IR pass [trt_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_map_matmul_v2_to_mul_pass]e[0m I0803 16:03:44.708014 23200 fuse_pass_base.cc:57] --- detected 9 subgraphs e[32m--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]e[0m I0803 16:03:44.710013 23200 fuse_pass_base.cc:57] --- detected 4 subgraphs e[32m--- Running IR pass [trt_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m I0803 16:03:44.718986 23200 fuse_pass_base.cc:57] --- detected 9 subgraphs e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m I0803 16:03:44.735950 23200 fuse_pass_base.cc:57] --- detected 23 subgraphs e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m I0803 16:03:44.746917 23200 tensorrt_subgraph_pass.cc:141] --- detect a sub-graph with 80 nodes I0803 16:03:44.760020 23200 tensorrt_subgraph_pass.cc:403] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0803 16:03:44.773983 23200 engine.cc:203] Run Paddle-TRT Dynamic Shape mode. I0803 16:04:04.398634 23200 engine.cc:424] Inspector needs TensorRT version 8.2 and after. I0803 16:04:04.404646 23200 tensorrt_subgraph_pass.cc:141] --- detect a sub-graph with 57 nodes I0803 16:04:04.410634 23200 tensorrt_subgraph_pass.cc:403] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0803 16:04:04.420575 23200 engine.cc:203] Run Paddle-TRT Dynamic Shape mode. I0803 16:04:06.677593 23200 engine.cc:424] Inspector needs TensorRT version 8.2 and after. e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m I0803 16:04:06.695578 23200 ir_params_sync_among_devices_pass.cc:100] Sync params from CPU to GPU e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I0803 16:04:06.703523 23200 memory_optimize_pass.cc:216] Cluster name : linear_43.tmp_1 size: 26500 I0803 16:04:06.703523 23200 memory_optimize_pass.cc:216] Cluster name : transpose_17.tmp_0 size: 256 I0803 16:04:06.703523 23200 memory_optimize_pass.cc:216] Cluster name : transpose_17.tmp_1 size: 0 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0803 16:04:06.770915 23200 analysis_predictor.cc:1035] ======= optimize end ======= I0803 16:04:06.771910 23200 naive_executor.cc:102] --- skip [feed], feed -> x I0803 16:04:06.771910 23200 naive_executor.cc:102] --- skip [softmax_5.tmp_0], fetch -> fetch predict img: D:\projects\cpp\PaddleOCR\doc\imgs\11.jpg W0803 16:04:07.114495 23200 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.5, Runtime API Version: 11.0 W0803 16:04:07.114495 23200 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0.
C++ Traceback (most recent call last):
Error Message Summary:
PreconditionNotMetError: The Tensor's element number must be equal or greater than zero. The Tensor's shape is [-1] now [Hint: Expected numel() >= 0, but received numel():-1 < 0:0.] (at ..\paddle\phi\core\dense_tensor_impl.cc:108)