硬件平台是Adreno(TM) 640,后端是opencl,gpu推理也会同时消耗很多cpu资源吗?
在高通8155平台,后端是opencl,循环只调用runSession和copyToHostTensor,发现会消耗很多cpu资源,请问如何定位具体是哪块消耗的?如何优化?谢谢! 这是统计的cpu资源消耗: ====== CPU Load Analysis ====== [MAX LOAD] Single Core: 30.70% Multi Core: 3.83% KDMIPS: 4.02
[AVG LOAD] Single Core: 26.10% Multi Core: 3.26% KDMIPS: 3.42
这是MNNV2Basic.out测试输出: precision:0, memory: 0, Run 10 time: input [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 3.114 % input.104 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 4.152 % input.112 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 3.690 % input.116 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.12 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.128 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 1.845 % input.128_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.014 % input.136 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.136_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.007 % input.144 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.156 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.156_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.014 % input.16 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 4.152 % input.172 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.176 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.231 % input.180 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.188 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.192 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.192_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.027 % input.20 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 4.152 % input.200 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 1.038 % input.208 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.212 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 2.076 % input.220 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.224 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.228 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 1.038 % input.232 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 1.038 % input.240 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.244 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 2.076 % input.252 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.256 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.260 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 1.038 % input.272 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.272_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.007 % input.36 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.4 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 8.303 % input.48 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 4.152 % input.56 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 3.690 % input.56_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.055 % input.60 [Pooling] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.027 % input.64 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input.76 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 4.152 % input.8 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.923 % input.84 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 3.690 % input.96 [Convolution] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.461 % input_raster_0 [Raster] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.110 % onnx::Concat_167 [Pooling] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.031 % onnx::Concat_171 [Pooling] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.031 % onnx::MaxPool_166 [Pooling] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.031 % onnx::MaxPool_176 [Pooling] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.031 % output1 [UnaryOp] run 10 average cost 0.000000 ms, 0.000 %, FlopsRate: 0.001 % input.100 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 4.152 % input.112_raster_0 [Raster] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.014 % input.120 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.923 % input.140 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.231 % input.152 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.461 % input.164 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 1.038 % input.172_raster_0 [Raster] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.014 % input.196 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 1.038 % input.256_raster_0 [Raster] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.007 % input.264 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 1.038 % input.276 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 8.303 % input.28 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 3.690 % input.32 [Pooling] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.055 % input.40 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.461 % input.72 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 4.152 % input.88 [Pooling] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.014 % input.92 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.461 % onnx::Concat_177 [Pooling] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.031 % onnx::Concat_190 [Interp] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.007 % onnx::Concat_211 [Interp] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.014 % onnx::MaxPool_170 [Pooling] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.031 % onnx::Sigmoid_256 [Convolution] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.173 % output1__before_tr_raster_0 [Raster] run 10 average cost 0.100000 ms, 0.205 %, FlopsRate: 0.001 % input.160 [Convolution] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 1.038 % input.208_raster_0 [Raster] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 0.027 % input.224_raster_0 [Raster] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 0.014 % input.240_raster_0 [Raster] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 0.014 % input.44 [Convolution] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 4.152 % input.68 [Convolution] run 10 average cost 0.200000 ms, 0.411 %, FlopsRate: 0.461 % input.28_raster_0 [Raster] run 10 average cost 0.300000 ms, 0.616 %, FlopsRate: 0.110 % input.84_raster_0 [Raster] run 10 average cost 0.300000 ms, 0.616 %, FlopsRate: 0.027 % Avg= 48.700001 ms, OpSum = 4.100000 ms min= 44.000000 ms, max= 55.000000 ms Update cache to .tempcache, from size:768060 -> size:769448