Paddle-Lite icon indicating copy to clipboard operation
Paddle-Lite copied to clipboard

picodet_m_416转出适用于opencl的模型出现问题

Open linghusmile opened this issue 11 months ago • 2 comments

Bug描述 Describe the Bug 当我在将picodet_m_416模型转为可以部署到高通平台上的model.nb时,因为只跑cpu耗时太多所以想使用opencl加速,我编译了支持opencl的Paddle-Lite预测库,通过指令"./lite/tools/build_android.sh --arch=armv8 --toolchain=gcc --with_cv=ON --with_extra=ON --with_opencl=ON,但是发现耗时还是很久,通过ReadMe发现模型也需要转,通过"./opt --model_file=/home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model.pdmodel --param_file=/home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model.pdiparams --optimize_out=/home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model --valid_targets=opencl" 于是出现了如下问题 Loading topology data from /home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model.pdmodel Loading params data from /home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model.pdiparams

Model is successfully loaded! [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39. 44 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed [W 3/ 7 19:55:39.145 ...zer/mir/fusion/conv_elementwise_fuser.cc:92 InsertNewNode] elementwise_add_bias_dims not equal to 1, fusion failed Model is optimized and saved into /home/xxxxx/PaddleDetection-release-2.7/output_inference/picodet_m_416_coco_lcnet/model.nb successfully 从上面的打印来看 应该是网络中有的层没有正确转换,请问原因是什么以及如何能让PPPicoDet在高通平台以高速运行并使用GPU. 复现环境 Environment -OS:ubuntu20.04 -python 3.8.18 -Paddle-Lite: v2.13-rc -Paddle-Detection: 2.7

linghusmile avatar Mar 07 '24 12:03 linghusmile

我发现只有用PaddleDetection编译出的picodet_m_416才有问题,直接从https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.7/configs/picodet 网页上下载的可以转 但是转完之后耗时要300ms+???fp32(因为编译16版本带opencl的预测库报错) 平台是高通845

./main det_runtime_config.json

Usage: ./main [config_path] image_dir config created before object detector [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1308 Setup] ARM multiprocessors name: HARDWARE : QUALCOMM TECHNOLOGIES, INC SDA845 SDM845_SDM845 [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1309 Setup] ARM multiprocessors number: 8 [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 0, max freq: 1766, min freq: 1766, cluster ID: 1, CPU ARCH: A55 [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 1, max freq: 1766, min freq: 1766, cluster ID: 1, CPU ARCH: A55 [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 2, max freq: 1766, min freq: 1766, cluster ID: 1, CPU ARCH: A55 [I 3/ 7 23:38:27. 92 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 3, max freq: 1766, min freq: 1766, cluster ID: 1, CPU ARCH: A55 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 4, max freq: 2803, min freq: 2803, cluster ID: 0, CPU ARCH: A75 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 5, max freq: 2803, min freq: 2803, cluster ID: 0, CPU ARCH: A75 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 6, max freq: 2803, min freq: 2803, cluster ID: 0, CPU ARCH: A75 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1311 Setup] ARM multiprocessors ID: 7, max freq: 2803, min freq: 2803, cluster ID: 0, CPU ARCH: A75 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1317 Setup] L1 DataCache size is: [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 32 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 32 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 32 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 32 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 64 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 64 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 64 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1319 Setup] 64 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1321 Setup] L2 Cache size is: [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 128 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 128 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 128 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 128 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 256 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 256 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 256 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1323 Setup] 256 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1325 Setup] L3 Cache size is: [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1327 Setup] 2048 KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1329 Setup] Total memory: 7875584KB [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1330 Setup] SVE2 support: 0 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1331 Setup] SVE2 f32mm support: 0 [I 3/ 7 23:38:27. 93 ...oks/Paddle-Lite/lite/core/device_info.cc:1332 Setup] SVE2 i8mm support: 0 [I 3/ 7 23:38:27.118 ...-Lite/lite/backends/opencl/cl_runtime.cc:69 Init] opencl_lib_found:1 [I 3/ 7 23:38:27.118 ...-Lite/lite/backends/opencl/cl_runtime.cc:77 Init] dlsym_success:1 [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:538 InitializePlatform] Platform extension:
[I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:85 Init] is_platform_init:1 [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:624 InitializeDevice] Using device: QUALCOMM Adreno(TM) [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:650 InitializeDevice] CL_DEVICE_VERSION:OpenCL 2.0 Adreno(TM) 630 [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:657 InitializeDevice] device_type:GPU [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:661 InitializeDevice] The chosen device has 2 compute units. [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:665 InitializeDevice] CL_DEVICE_MAX_CLOCK_FREQUENCY:1 [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:675 InitializeDevice] The local memory size of the chosen device is 32.000000 KB. [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:682 InitializeDevice] CL_DEVICE_GLOBAL_MEM_CACHE_SIZE(KB):128.000000 KB. [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:690 InitializeDevice] CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE(KB):0.062500 KB. [I 3/ 7 23:38:27.120 ...-Lite/lite/backends/opencl/cl_runtime.cc:697 InitializeDevice] CL_DEVICE_GLOBAL_MEM_SIZE(KB):3937792.000000 KB. [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:705 InitializeDevice] CL_DEVICE_MAX_WORK_GROUP_SIZE:1024 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:709 InitializeDevice] CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:3 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:714 InitializeDevice] max_work_item_sizes[0]:1024 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:714 InitializeDevice] max_work_item_sizes[1]:1024 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:714 InitializeDevice] max_work_item_sizes[2]:1024 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:725 InitializeDevice] CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:64.000000 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:736 InitializeDevice] The chosen device supports image processing. [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:740 InitializeDevice] CL_DEVICE_IMAGE2D_MAX_HEIGHT:16384 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:744 InitializeDevice] CL_DEVICE_IMAGE2D_MAX_WIDTH:16384 [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:758 InitializeDevice] The chosen device supports the half data type. [I 3/ 7 23:38:27.121 ...-Lite/lite/backends/opencl/cl_runtime.cc:766 InitializeDevice] CL_DEVICE_ADDRESS_BITS:64 [I 3/ 7 23:38:27.124 ...-Lite/lite/backends/opencl/cl_runtime.cc:770 InitializeDevice] CL_DRIVER_VERSION:OpenCL 2.0 QUALCOMM build: commit #cc2119f changeid #I5790043375 Date: 04/10/19 Wed Local Branch: mybranche95b5ccd-1450-f15f-981b-5cb90179c3d8 Remote Branch: quic/gfx-adreno.lnx.1.0.r47-rel Compiler E031.36.02.00 [I 3/ 7 23:38:27.124 ...-Lite/lite/backends/opencl/cl_runtime.cc:93 Init] is_device_init:1 [I 3/ 7 23:38:27.124 ...-Lite/lite/backends/opencl/cl_runtime.cc:810 GetAdrenoContextProperties] GPUPerfMode::PERF_HIGH [I 3/ 7 23:38:27.124 ...-Lite/lite/backends/opencl/cl_runtime.cc:829 GetAdrenoContextProperties] GPUPriorityLevel::PRIORITY_HIGH [I 3/ 7 23:38:27.127 ...-Lite/lite/backends/opencl/cl_runtime.cc:105 Init] set is_cl_runtime_initialized_ = true [I 3/ 7 23:38:27.132 ...e-Lite/lite/backends/opencl/cl_runtime.h:93 OpenCLAvaliableForDevice] need to check fp16 valid:0 create object detector [I 3/ 7 23:38:29.752 ...nels/opencl/elementwise_image_compute.cc:102 PrepareForRun] with y->persistable [I 3/ 7 23:38:29.816 ...nels/opencl/elementwise_image_compute.cc:102 PrepareForRun] with y->persistable [I 3/ 7 23:38:29.820 ...nels/opencl/elementwise_image_compute.cc:102 PrepareForRun] with y->persistable class=0 confidence=0.7554 rect=[1 9 182 371] ./pictures/0000001.png The number of detected box: 1 Visualized output saved as result_0000001.png class=0 confidence=0.8296 rect=[1 1 183 372] ./pictures/0000002.png The number of detected box: 1 Visualized output saved as result_0000002.png class=0 confidence=0.8535 rect=[1 3 182 371] ./pictures/0000003.png The number of detected box: 1 Visualized output saved as result_0000003.png class=0 confidence=0.8354 rect=[1 1 183 371] ./pictures/0000004.png The number of detected box: 1 Visualized output saved as result_0000004.png class=0 confidence=0.8574 rect=[1 3 182 371] ./pictures/0000005.png The number of detected box: 1 Visualized output saved as result_0000005.png class=0 confidence=0.8579 rect=[2 3 182 371] ./pictures/0000006.png The number of detected box: 1 Visualized output saved as result_0000006.png class=0 confidence=0.8306 rect=[1 5 182 370] ./pictures/0000007.png The number of detected box: 1 Visualized output saved as result_0000007.png class=0 confidence=0.8193 rect=[1 2 183 370] ./pictures/0000008.png The number of detected box: 1 Visualized output saved as result_0000008.png class=0 confidence=0.8389 rect=[1 3 182 370] ./pictures/0000009.png The number of detected box: 1 Visualized output saved as result_0000009.png class=0 confidence=0.8774 rect=[1 3 182 370] ./pictures/0000010.png The number of detected box: 1 Visualized output saved as result_0000010.png ----------------------- Config info ----------------------- num_threads: 4 ----------------------- Data info ----------------------- batch_size_det: 1 ----------------------- Model info ----------------------- detection model_name: ./model_det/ ----------------------- Perf info ------------------------ Total number of predicted data: 10 and total time spent(ms): 3428.99 preproce_time(ms): 12.7428, inference_time(ms): 330.121, postprocess_time(ms): 0.034401 [I 3/ 7 23:38:31. 58 ...-Lite/lite/backends/opencl/cl_runtime.cc:41 ~CLRuntime] is_cl_runtime_initialized_:1

linghusmile avatar Mar 08 '24 03:03 linghusmile

可以参考该文档高级特性章节,进行性能瓶颈分析:https://www.paddlepaddle.org.cn/lite/develop/demo_guides/opencl.html

shentanyue avatar Mar 11 '24 05:03 shentanyue