PoseEstimationForMobile 训练结果模型转换为MACE模型问题

使用PoseEstimation训练框架训练出模型，并通过以下代码段转换为frozen-pb模型；
# Convert to frozen pb. cd training python3 src/gen_frozen_pb.py \ --checkpoint=<you_training_model_path>/model-xxx --output_graph=<you_output_model_path>/model-xxx.pb \ --size=192 --model=mv2_cpm_2

然而，将frozen-pb模型转换为MACE模型后，并通过以下代码段进行MACE模型校验：
python tools/converter.py run --config=mace_new/builds/downloads/test.yml --validate

校验时，报warning如下：
`Generate input file: builds/testSrcFrozen/_tmp/frozen_src/99da42816155dfdcf80ef3ba3a67028b/general/arm64-v8a/model_input_image Generate input file done.

Run 'frozen_src' with round=1, restart_round=1, tuning=False, out_of_range_check=False, omp_num_threads=(-1,), cpu_affinity_policy=(1,), gpu_perf_hint=(3,), gpu_priority_hint=(3,) Push builds/testSrcFrozen/_tmp/frozen_src/99da42816155dfdcf80ef3ba3a67028b/general/arm64-v8a/model_input_image to /data/local/tmp/mace_run Push third_party/nnlib/libhexagon_controller.so to /data/local/tmp/mace_run Push builds/testSrcFrozen/_tmp/arm64-v8a/mace_run_static to /data/local/tmp/mace_run Push /tmp/cmd_file-frozen_src-1533296953.13 to /data/local/tmp/mace_run/cmd_file-frozen_src-1533296953.13 WARNING: linker: "/data/local/tmp/mace_run/mace_run_static" unused DT entry: type 0xf arg 0x676 I mace_run.cc:428 model name: frozen_src I mace_run.cc:429 mace version: v0.8.1-117-g1855fe1-20180803 I mace_run.cc:430 input node: image I mace_run.cc:431 input shape: 1,224,224,3 I mace_run.cc:432 output node: Convolutional_Pose_Machine/stage_1_out I mace_run.cc:433 output shape: 1,112,112,10 I mace_run.cc:434 input_file: /data/local/tmp/mace_run/model_input I mace_run.cc:435 output_file: /data/local/tmp/mace_run/model_out I mace_run.cc:436 model_data_file: /data/local/tmp/mace_run/frozen_src.data I mace_run.cc:437 model_file: I mace_run.cc:438 device: CPU I mace_run.cc:439 round: 1 I mace_run.cc:440 restart_round: 1 I mace_run.cc:441 gpu_perf_hint: 3 I mace_run.cc:442 gpu_priority_hint: 3 I mace_run.cc:443 omp_num_threads: -1 I mace_run.cc:444 cpu_affinity_policy: 1 I mace_run.cc:467 restart round 0 I mace.cc:165 Creating MaceEngine, MACE version: v0.8.1-117-g1855fe1-20180803 I mace.cc:173 Initializing MaceEngine W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_0/inverted_bottleneck_MobilenetV2_part_0_1/MobilenetV2_part_0_1_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_0/inverted_bottleneck_MobilenetV2_part_0_2/MobilenetV2_part_0_2_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_1/MobilenetV2_part_1_1_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_2/MobilenetV2_part_1_2_up_pointwise/Relu W arg_helper.cc:26 Duplicated argument activation found in operator MobilenetV2/MobilenetV2_part_1/inverted_bottleneck_MobilenetV2_part_1_3/MobilenetV2_part_1_3_up_pointwise/Relu`

且GPU模式报error如下：
I mace_run.cc:313 Warm up run I concat.cc:266 test/bai start I concat.cc:267 inputs_count is : 5 I tensor.h:326 Tensor MobilenetV2/mv2_0_max_pool size: [1, 28, 28, 12, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_1_max_pool size: [1, 28, 28, 18, ], content: I tensor.h:326 Tensor MobilenetV2/MobilenetV2_part_2/inverted_bottleneck_MobilenetV2_part_2_5/Add size: [1, 28, 28, 24, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_3_upsample size: [1, 28, 28, 48, ], content: I tensor.h:326 Tensor MobilenetV2/mv2_4_upsample size: [1, 28, 28, 72, ], content: I concat.cc:271 test/bai end F concat.cc:275 Check failed: inputs_count == 2 || divisible_four Dimensions of inputs should be divisible by 4 when inputs_count > 2. Aborted ERROR: [Mace Run] Mace run failed.

具体MACE模型校验日志:
src_frozen2.txt GPU错误输出.txt

Aug 03 '18 12:08 lvchigo

我在转换的时候没有遇到过这个问题，有改网络结构么？

Aug 03 '18 12:08 edvardHua

@edvardHua 网络结构有6-stage改为2-stage，然后重新训练，仅此而已。。

Aug 06 '18 02:08 lvchigo

通过打印错误日志，发现“MobilenetV2/mv2_1_max_pool“的channel为18，不能被4整除（MACE的GPU错误来源），具体错误如下： `I concat.cc:266 test/bai start I concat.cc:267 inputs_count is : 5 I tensor.h:328 Tensor MobilenetV2/mv2_0_max_pool size: [1, 24, 24, 12, ] I concat.cc:270 dim(axis_) is :12

I tensor.h:328 Tensor MobilenetV2/mv2_1_max_pool size: [1, 24, 24, 18, ] I concat.cc:270 dim(axis_) is :18

I tensor.h:328 Tensor MobilenetV2/MobilenetV2_part_2/inverted_bottleneck_MobilenetV2_part_2_5/Add size: [1, 24, 24, 24, ] I concat.cc:270 dim(axis_) is :24

I tensor.h:328 Tensor MobilenetV2/mv2_3_upsample size: [1, 24, 24, 48, ] I concat.cc:270 dim(axis_) is :48

I tensor.h:328 Tensor MobilenetV2/mv2_4_upsample size: [1, 24, 24, 72, ] I concat.cc:270 dim(axis_) is :72

I concat.cc:272 test/bai end F concat.cc:276 Check failed: inputs_count == 2 || divisible_four Dimensions of inputs should be divisible by 4 when inputs_count > 2. Aborted ERROR: [Mace Run] Mace run failed.`

建议修改“training/src/network_mv2_cpm.py”中： mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), out_channel_ratio(24), 1, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), ], scope="MobilenetV2_part_1") 为： mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), 20, 1, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), ], scope="MobilenetV2_part_1")

Aug 07 '18 07:08 lvchigo

噢噢，晚些我完善下。

Aug 07 '18 09:08 edvardHua

@edvardHua 问题解决了吗？我也遇到了在GPU上crash的问题，报的错也跟你一样，是不是得修改模型啊

Aug 31 '18 02:08 qiaowei1214

@qiaowei1214 问题已解决，请查看修改建议。。建议修改“training/src/network_mv2_cpm.py”。。。

Aug 31 '18 02:08 lvchigo

@lvchigo 我用的是openPose的模型，没有用这个项目的cpm模型

Aug 31 '18 06:08 qiaowei1214

에러 로그를 출력 한 결과 "MobilenetV2 / mv2_1_max_pool"의 채널이 18로 4로 나눌 수 없음 (MACE GPU 오류의 원인) 특정 오류는 다음과 같습니다 .`I concat.cc:266 test / bai start I concat.cc:267 inputs_count is : 5 I tensor.h : 328 Tensor MobilenetV2 / mv2_0_max_pool size : [1, 24, 24, 12,] I concat.cc:270 dim (axis_) is : 12

I tensor.h : 328 Tensor MobilenetV2 / mv2_1_max_pool size : [1, 24, 24, 18,] concat.cc:270 dim (axis_) is : 18

I tensor.h : 328 Tensor MobilenetV2 / MobilenetV2_part_2 / inverted_bottleneck_MobilenetV2_part_2_5 / Add size : [1, 24, 24, 24,] concat.cc:270 dim (axis_) is : 24

I tensor.h : 328 Tensor MobilenetV2 / mv2_3_upsample size : [1, 24, 24, 48,] I concat.cc:270 dim (axis_) is : 48

I tensor.h : 328 Tensor MobilenetV2 / mv2_4_upsample size : [1, 24, 24, 72,] I concat.cc:270 dim (axis_) is : 72

I concat.cc:272 test / bai end F concat.cc:276 확인 실패 : inputs_count == 2 || divisible_four inputs_count> 2 일 때 입력 크기는 4로 나눌 수 있어야합니다. 중단됨 ERROR : [Mace Run] Mace 실행이 실패했습니다. `

"training / src / network_mv2_cpm.py"를 다음 mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), out_channel_ratio(24), 1, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), (up_channel_ratio(6), out_channel_ratio(24), 0, 3), ], scope="MobilenetV2_part_1") 과 같이 수정하는 것이 좋습니다 . mv2_branch_1 = slim.stack(mv2_branch_0, inverted_bottleneck, [ (up_channel_ratio(6), 20, 1, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), (up_channel_ratio(6), 20, 0, 3), ], scope="MobilenetV2_part_1")

Doesn't an error occur if only that part is corrected??(only this? mv2_branch_1) People are training hands. 如果仅校正了该部分，不会发生错误吗？(only this? mv2_branch_1) 人们在训练双手。 (output layer 21point)

Nov 30 '20 01:11 chartores

PoseEstimationForMobile PoseEstimationForMobile copied to clipboard

训练结果模型转换为MACE模型问题

PoseEstimationForMobile
PoseEstimationForMobile copied to clipboard