PaddleOCR
PaddleOCR copied to clipboard
使用use_gpu=True和False的预测时间相差不大,都需要3s左右很慢,请问该如何解决
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
- 系统环境/System Environment:windows10/tesla p4/NVIDIA-SMI 516.94/Driver Version: 516.94/CUDA Version: 11.7
- 版本号/Version:Paddle: PaddleOCR:release2.5 问题相关组件/Related components:
- 问题描述:我按照Windows下编译教程这个教程,自己用了VS2022+CMake部署的,使用的预测库是avx_mkl_cuda11.6_cudnn8.4_avx_mkl-trt8.4.1.5,执行命令后每一张图片都要3秒左右,这和我--use_gpu=False的速度差不多,我也试过把文件夹里的图片增加到4张,每一张图片也是3s左右的时间才能预测完。然后还想问一下的就是要怎么显示准确的每一张图的预测时间呢,是有什么参数嘛,翻了下文档没找着
- 运行指令/Command Code:.\ppocr.exe system --det_model_dir=D:\ppocr\ch_PP-OCRv3_det_infer --rec_model_dir=D:\ppocr\ch_PP-OCRv3_rec_infer use_gpu=True --use_tensorrt=True --image_dir=img/
- 完整报错/Complete Error Message: D:\ppocr>.\ppocr.exe system --det_model_dir=D:\ppocr\ch_PP-OCRv3_det_infer --rec_model_dir=D:\ppocr\ch_PP-OCRv3_rec_infer use_gpu=True --use_tensorrt=True --image_dir=img/ total images num: 1 e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [layer_norm_fuse_pass]e[0m e[37m--- Fused 0 subgraphs into layer_norm op.e[0m e[32m--- Running IR pass [attention_lstm_fuse_pass]e[0m e[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass]e[0m e[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass]e[0m e[32m--- Running IR pass [mul_lstm_fuse_pass]e[0m e[32m--- Running IR pass [fc_gru_fuse_pass]e[0m e[37m--- fused 0 pairs of fc gru patternse[0m e[32m--- Running IR pass [mul_gru_fuse_pass]e[0m e[32m--- Running IR pass [seq_concat_fc_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [matmul_v2_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m e[32m--- Running IR pass [repeated_fc_relu_fuse_pass]e[0m e[32m--- Running IR pass [squared_mat_sub_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m WARNING: Logging before InitGoogleLogging() is written to STDERR I0811 15:30:09.269706 10932 fuse_pass_base.cc:57] --- detected 33 subgraphs e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]e[0m I0811 15:30:09.316879 10932 fuse_pass_base.cc:57] --- detected 1 subgraphs e[32m--- Running IR pass [is_test_pass]e[0m e[32m--- Running IR pass [runtime_context_cache_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : hardswish_18.tmp_0 size: 1920 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : tmp_5 size: 384 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : batch_norm_44.tmp_3 size: 1920 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : batch_norm_44.tmp_0 size: 1920 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : elementwise_add_3 size: 96 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : depthwise_conv2d_14.tmp_0 size: 1920 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : tmp_1 size: 384 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : tmp_9 size: 384 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : elementwise_add_7 size: 224 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : elementwise_add_1 size: 64 I0811 15:30:09.332502 10932 memory_optimize_pass.cc:216] Cluster name : x size: 12 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0811 15:30:09.426035 10932 analysis_predictor.cc:1035] ======= optimize end ======= I0811 15:30:09.426035 10932 naive_executor.cc:102] --- skip [feed], feed -> x I0811 15:30:09.426035 10932 naive_executor.cc:102] --- skip [sigmoid_0.tmp_0], fetch -> fetch In PP-OCRv3, default rec_img_h is 48,if you use other model, you should set the param rec_img_h=32 e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [layer_norm_fuse_pass]e[0m e[37m--- Fused 0 subgraphs into layer_norm op.e[0m e[32m--- Running IR pass [attention_lstm_fuse_pass]e[0m e[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass]e[0m e[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass]e[0m e[32m--- Running IR pass [mul_lstm_fuse_pass]e[0m e[32m--- Running IR pass [fc_gru_fuse_pass]e[0m e[37m--- fused 0 pairs of fc gru patternse[0m e[32m--- Running IR pass [mul_gru_fuse_pass]e[0m e[32m--- Running IR pass [seq_concat_fc_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m e[32m--- Running IR pass [matmul_v2_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m I0811 15:30:09.504212 10932 fuse_pass_base.cc:57] --- detected 9 subgraphs e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m I0811 15:30:09.504212 10932 fuse_pass_base.cc:57] --- detected 4 subgraphs e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m e[32m--- Running IR pass [fc_fuse_pass]e[0m I0811 15:30:09.504212 10932 fuse_pass_base.cc:57] --- detected 9 subgraphs e[32m--- Running IR pass [repeated_fc_relu_fuse_pass]e[0m e[32m--- Running IR pass [squared_mat_sub_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I0811 15:30:09.566637 10932 fuse_pass_base.cc:57] --- detected 19 subgraphs e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]e[0m e[32m--- Running IR pass [is_test_pass]e[0m e[32m--- Running IR pass [runtime_context_cache_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : transpose_14.tmp_1 size: 0 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : linear_43.tmp_1 size: 26500 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : conv2d_113.tmp_1 size: 2048 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : batch_norm_35.tmp_3 size: 6144 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : x size: 576 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : batch_norm_47.tmp_3 size: 6144 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : transpose_13.tmp_0_slice_2 size: 480 I0811 15:30:09.597853 10932 memory_optimize_pass.cc:216] Cluster name : transpose_10.tmp_0_slice_2 size: 480 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I0811 15:30:09.660368 10932 analysis_predictor.cc:1035] ======= optimize end ======= I0811 15:30:09.660368 10932 naive_executor.cc:102] --- skip [feed], feed -> x I0811 15:30:09.660368 10932 naive_executor.cc:102] --- skip [softmax_5.tmp_0], fetch -> fetch predict img: img\1.jpg img\1.jpg The detection visualized image saved in ./output//1.jpg