自训练分类器模型转rknn失败
环境
- 【FastDeploy版本】:develop分支,最后更新Fri Jul 7 17:43:08 2023 +0800, 提交 4c1e80b7231a81e898b2bbaf1df07cce136de38f
- 【系统平台】: Linux x64(Ubuntu 22.04)
- 【硬件】: wsl
- 【编译语言】:Python 3.10.6
问题日志及出现问题的操作流程
-
【自训练分类器模型转rknn流程】
-
- 在PaddleClas Release 2.5 训练Resnet50_vd模型,配置文件./ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml,修改数据集为CIFAR100.tar方便快速验证
-
- 训练模型导出推理模型
-
- 使用paddle2onnx转换为onnx模型并固定shape。paddle2onnx (1.0.6)
-
- 转rknn模型失败,错误日志如下:
-
(rknn2_1.5) xxx@xxxxxx:~/PaddlePadle/FastDeploy$ python ./tools/rknpu2/export.py
--config_path ./tools/rknpu2/config/ResNet50_vd_infer_rknn.yaml
--target_platform rk3568 {'model_path': './ResNet50_vd_infer/ResNet50_vd_infer.onnx', 'output_folder': './ResNet50_vd_infer', 'mean': [[123.675, 116.28, 103.53]], 'std': [[58.395, 57.12, 57.375]], 'outputs_nodes': None, 'do_quantization': False, 'dataset': './ResNet50_vd_infer/dataset.txt'} W init: rknn-toolkit2 version: 1.5.0+1fa95b5c E load_onnx: Catch exception when loading onnx model: /home/xxx/PaddlePadle/FastDeploy/ResNet50_vd_infer/ResNet50_vd_infer.onnx! E load_onnx: Traceback (most recent call last): E load_onnx: File "rknn/api/rknn_base.py", line 1382, in rknn.api.rknn_base.RKNNBase.load_onnx E load_onnx: File "rknn/api/rknn_base.py", line 658, in rknn.api.rknn_base.RKNNBase._create_ir_and_inputs_meta E load_onnx: File "rknn/api/ir_graph.py", line 58, in rknn.api.ir_graph.IRGraph.init E load_onnx: File "rknn/api/ir_graph.py", line 503, in rknn.api.ir_graph.IRGraph.rebuild E load_onnx: File "/home/xxx/.local/lib/python3.10/site-packages/onnx/checker.py", line 119, in check_model E load_onnx: C.check_model(protobuf_string, full_check) E load_onnx: onnx.onnx_cpp2py_export.checker.ValidationError: Field 'shape' of 'type' is required but missing. W If you can't handle this error, please try updating to the latest version of the toolkit2 and runtime from: https://eyun.baidu.com/s/3eTDMk6Y (Pwd: rknn) Path: RK_NPU_SDK / RK_NPU_SDK_1.X.0 / develop / If the error still exists in the latest version, please collect the corresponding error logs and the model, convert script, and input data that can reproduce the problem, and then submit an issue on: https://redmine.rock-chips.com (Please consult our sales or FAE for the redmine account) Traceback (most recent call last): File "/home/xxx/PaddlePadle/FastDeploy/./tools/rknpu2/export.py", line 52, inassert ret == 0, "Load model failed!" AssertionError: Load model failed! -
【已做的分析与验证工作】
-
- 确认环境正常。 PaddleClas Release 2.5 训练模型无异常,训练模型导出推理模型过程无异常 完全按照https://github.com/PaddlePaddle/PaddleClas/tree/develop/deploy/fastdeploy/rockchip/rknpu2 使用官方模型,则转为onnx模型并固定shape,一直到转rknn模型完全正常。
-
- 排除rknn-toolkit2版本问题 更换版本至rknn_toolkit2-1.4.2b3+0bdd72ff-cp36-cp36m-linux_x86_64.whl进行测试,结果是一样的报错。
-
- 排除自定义训练模型问题 下载预训练模型Resnet50_vd,并进行导出与转换,也在最后转rknn时报相同的错误。
-
【分析与猜测】 下面是我的分析与猜测,不一定正确仅供参考
-
- 怀疑是PaddleClas 2.5 所用的模型定义发生了变化触发的问题。
-
比较了可用模型 https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz 和训练模型,同为Resnet50_vd模型却在最后几个节点存在差异,一个是pool2d --> reshape2 --> matmul --> elementwise_add --> softmax --> scale --> 0,另一个则是pool2d --> flatten_contiguous_range --> matmul_v2 --> elementwise_add --> softmax --> 0
ResNet50_vd_CIFAR.zip 这个是测试用的模型,仅训练了少量的epoch做格式转换验证用
试了开发版本rknn-toolkit2 version: 1.5.1b17+7ca6b722,结果也是一样的。不知道怎么弄了。
可以询问一下PaddleClas