PoseEstimationForMobile
PoseEstimationForMobile copied to clipboard
训练结果模型转换为MACE模型问题
使用PoseEstimation训练框架训练出模型,并通过以下代码段转换为frozen-pb模型;
# Convert to frozen pb. cd training python3 src/gen_frozen_pb.py \ --checkpoint=<you_training_model_path>/model-xxx --output_graph=<you_output_model_path>/model-xxx.pb \ --size=192 --model=mv2_cpm_2
然而,将frozen-pb模型转换为MACE模型后,并通过以下代码段进行MACE模型校验:
python tools/converter.py run --config=mace_new/builds/downloads/test.yml --validate
校验时,报warning如下:
`Generate input file: builds/testSrcFrozen/_tmp/frozen_src/99da42816155dfdcf80ef3ba3a67028b/general/arm64-v8a/model_input_image
Generate input file done.
- Run 'frozen_src' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,) Push builds/testSrcFrozen/_tmp/frozen_src/99da42816155dfdcf80ef3ba3a67028b/general/arm64-v8a/model_input_image to /data/local/tmp/mace_run Push third_party/nnlib/libhexagon_controller.so to /data/local/tmp/mace_run Push builds/testSrcFrozen/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run Push /tmp/cmd_file-frozen_src-1533296953.13 to /data/local/tmp/mace_run/cmd_file-frozen_src-1533296953.13 WARNING: linker: "/data/local/tmp/mace_run/mace_run_static" unused DT entry: type 0xf arg 0x676 I mace_run.cc:428 model name: frozen_src I mace_run.cc:429 mace version: v0.8.1-117-g1855fe1-20180803 I mace_run.cc:430 input node: image I mace_run.cc:431 input shape: 1,224,224,3 I mace_run.cc:432 output node: Convolutional_Pose_Machine/stage_1_out I mace_run.cc:433 output shape: 1,112,112,10 I mace_run.cc:434 input_file: /data/local/tmp/mace_run/model_input I mace_run.cc:435 output_file: /data/local/tmp/mace_run/model_out I mace_run.cc:436 model_data_file: /data/local/tmp/mace_run/frozen_src.data I mace_run.cc:437 model_file: I mace_run.cc:438 device: CPU I mace_run.cc:439 round: 1 I mace_run.cc:440 restart_round: 1 I mace_run.cc:441 gpu_perf_hint: 3 I mace_run.cc:442 gpu_priority_hint: 3 I mace_run.cc:443 omp_num_threads: -1 I mace_run.cc:444 cpu_affinity_policy: 1 I mace_run.cc:467 restart round 0 I mace.cc:165 Creating MaceEngine, MACE version: v0.8.1-117-g1855fe1-20180803 I mace.cc:173 Initializing MaceEngine W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_0/inverted_bottleneck_MobilenetV2_part_0_1/MobilenetV2_part_0_1_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_0/inverted_bottleneck_MobilenetV2_part_0_2/MobilenetV2_part_0_2_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_1/MobilenetV2_part_1_1_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_2/MobilenetV2_part_1_2_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_3/MobilenetV2_part_1_3_up_pointwise/Relu`
且GPU模式报error如下:
I mace_run.cc:313 Warm up run I concat.cc:266 test/bai start I concat.cc:267 inputs_count is : 5 I tensor.h:326 Tensor MobilenetV2/mv2_0_max_pool size: [1, 28, 28, 12, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_1_max_pool size: [1, 28, 28, 18, ], content: I tensor.h:326 Tensor MobilenetV2/MobilenetV2_part_2/inverted_bottleneck_MobilenetV2_part_2_5/Add size: [1, 28, 28, 24, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_3_upsample size: [1, 28, 28, 48, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_4_upsample size: [1, 28, 28, 72, ], content: I concat.cc:271 test/bai end F concat.cc:275 Check failed: inputs_count == 2 || divisible_four Dimensions of inputs should be divisible by 4 when inputs_count > 2. Aborted ERROR: [Mace Run] Mace run failed.
具体MACE模型校验日志:
src_frozen2.txt
GPU错误输出.txt
我在转换的时候没有遇到过这个问题,有改网络结构么?
@edvardHua 网络结构有6-stage改为2-stage,然后重新训练,仅此而已。。
通过打印错误日志,发现“MobilenetV2/mv2_1_max_pool“的channel为18,不能被4整除(MACE的GPU错误来源),具体错误如下: `I concat.cc:266 test/bai start I concat.cc:267 inputs_count is : 5 I tensor.h:328 Tensor MobilenetV2/mv2_0_max_pool size: [1, 24, 24, 12, ] I concat.cc:270 dim(axis_) is :12
I tensor.h:328 Tensor MobilenetV2/mv2_1_max_pool size: [1, 24, 24, 18, ] I concat.cc:270 dim(axis_) is :18
I tensor.h:328 Tensor MobilenetV2/MobilenetV2_part_2/inverted_bottleneck_MobilenetV2_part_2_5/Add size: [1, 24, 24, 24, ] I concat.cc:270 dim(axis_) is :24
I tensor.h:328 Tensor MobilenetV2/mv2_3_upsample size: [1, 24, 24, 48, ] I concat.cc:270 dim(axis_) is :48
I tensor.h:328 Tensor MobilenetV2/mv2_4_upsample size: [1, 24, 24, 72, ] I concat.cc:270 dim(axis_) is :72
I concat.cc:272 test/bai end F concat.cc:276 Check failed: inputs_count == 2 || divisible_four Dimensions of inputs should be divisible by 4 when inputs_count > 2. Aborted ERROR: [Mace Run] Mace run failed.`
建议修改“training/src/network_mv2_cpm.py”中:
mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), out_channel_ratio(24), 1, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), ], scope="MobilenetV2_part_1")
为:
mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), 20, 1, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), ], scope="MobilenetV2_part_1")
噢噢,晚些我完善下。
@edvardHua 问题解决了吗?我也遇到了在GPU上crash的问题,报的错也跟你一样,是不是得修改模型啊
@qiaowei1214 问题已解决,请查看修改建议。。 建议修改“training/src/network_mv2_cpm.py”。。。
@lvchigo 我用的是openPose的模型,没有用这个项目的cpm模型
에러 로그를 출력 한 결과 "MobilenetV2 / mv2_1_max_pool"의 채널이 18로 4로 나눌 수 없음 (MACE GPU 오류의 원인) 특정 오류는 다음과 같습니다 .`I concat.cc:266 test / bai start I concat.cc:267 inputs_count is : 5 I tensor.h : 328 Tensor MobilenetV2 / mv2_0_max_pool size : [1, 24, 24, 12,] I concat.cc:270 dim (axis_) is : 12
I tensor.h : 328 Tensor MobilenetV2 / mv2_1_max_pool size : [1, 24, 24, 18,] concat.cc:270 dim (axis_) is : 18
I tensor.h : 328 Tensor MobilenetV2 / MobilenetV2_part_2 / inverted_bottleneck_MobilenetV2_part_2_5 / Add size : [1, 24, 24, 24,] concat.cc:270 dim (axis_) is : 24
I tensor.h : 328 Tensor MobilenetV2 / mv2_3_upsample size : [1, 24, 24, 48,] I concat.cc:270 dim (axis_) is : 48
I tensor.h : 328 Tensor MobilenetV2 / mv2_4_upsample size : [1, 24, 24, 72,] I concat.cc:270 dim (axis_) is : 72
I concat.cc:272 test / bai end F concat.cc:276 확인 실패 : inputs_count == 2 || divisible_four inputs_count> 2 일 때 입력 크기는 4로 나눌 수 있어야합니다. 중단됨 ERROR : [Mace Run] Mace 실행이 실패했습니다. `
"training / src / network_mv2_cpm.py"를 다음
mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), out_channel_ratio(24), 1, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), ], scope="MobilenetV2_part_1")
과 같이 수정하는 것이 좋습니다 .mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), 20, 1, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), ], scope="MobilenetV2_part_1")
Doesn't an error occur if only that part is corrected??(only this? mv2_branch_1) People are training hands. 如果仅校正了该部分,不会发生错误吗?(only this? mv2_branch_1) 人们在训练双手。 (output layer 21point)