mindyolo 使用教程提供数据集训练yolov7模型加载出现问题。

1.环境配置：modelart（mindspore_1.10.0-cann_6.0.1-py_3.7-euler_2.8.3镜像），EulerOS 2.0 (SP8), CANN-6.0.1，mindspore1.10， mindyolo r0.1。 2.数据集制作及训练过程文档：https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/README.md 3.训练过程中出现报错： RuntimeError: For 'load_param_into_net', model.model.77.m.0.weight in the argument 'net' should have the same shape as model.model.77.m.0.weight in the argument 'parameter_dict'. But got its shape (21, 128, 1, 1) in the argument 'net' and shape (255, 128, 1, 1) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same. 请问如何让解决。 [注：日志文件见附件。 outputlog.txt

Mar 01 '24 06:03 Living190711

1.环境配置：modelart（mindspore_1.9.0-cann_6.0.0-py_3.7-euler_2.8.3）， EulerOS 2.0 (SP8), CANN-6.0.RC1，mindyolo r0.1。

2.参考Master分支数据集自建方式（https://github.com/mindspore-lab/mindyolo/tree/master/examples/finetune_SHWD）自建数据集训练模型，训练用的mindyolo_r0.1分支。

3.配置文件： BASE: [ '/home/ma-user/work/mindyolo-r0.1/configs/yolov8/yolov8n.yaml', ]

per_batch_size: 16 # 16 * 8 = 128 img_size: 640 # image sizes weight: /home/ma-user/work/mindyolo-r0.1/pre-ckpt/yolov8-n_500e_mAP372-cc07f5bd.ckpt strict_load: False

data: dataset_name: shwd train_set: /home/ma-user/work/mindyolo-r0.1/dataset-test/SHWD/train.txt val_set: /home/ma-user/work/mindyolo-r0.1/dataset-test/SHWD/val.txt test_set: /home/ma-user/work/mindyolo-r0.1/dataset-test/SHWD/val.txt nc: 3

names: [ 'helmet', 'gloves', 'shawl' ]

optimizer: lr_init: 0.001 # initial learning rate

3.训练过程（模型加载）中出现报错，利用yolov8n、yolov7-tiny、yolov5n预训练模型训练，都出现了模型加载错误：

yolov8n： [CRITICAL] ME(22510:281472828627520,MainProcess):2024-03-02-14:35:39.167.353 [mindspore/train/serialization.py:112] Failed to combine the net and the parameters for param model.model.22.cv3.0.0.conv.weight. Traceback (most recent call last): File "train.py", line 290, in train(args) File "train.py", line 128, in train load_pretrain(network, args.weight, ema, args.ema_weight) # load pretrain File "/home/ma-user/work/mindyolo-r0.1/mindyolo/utils/utils.py", line 91, in load_pretrain ms.load_param_into_net(network, param_dict) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 703, in load_param_into_net _load_dismatch_prefix_params(net, parameter_dict, param_not_load, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 742, in _load_dismatch_prefix_params _update_param(param, new_param, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 118, in _update_param raise RuntimeError(msg) RuntimeError: For 'load_param_into_net', model.model.22.cv3.0.0.conv.weight in the argument 'net' should have the same shape as model.model.22.cv3.0.0.conv.weight in the argument 'parameter_dict'. But got its shape (64, 64, 3, 3) in the argument 'net' and shape (80, 64, 3, 3) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same.

yolov7-tiny： [CRITICAL] ME(44701:281473522788928,MainProcess):2024-03-02-14:51:05.733.431 [mindspore/train/serialization.py:112] Failed to combine the net and the parameters for param model.model.77.m.0.weight. Traceback (most recent call last): File "train.py", line 290, in train(args) File "train.py", line 128, in train load_pretrain(network, args.weight, ema, args.ema_weight) # load pretrain File "/home/ma-user/work/mindyolo-r0.1/mindyolo/utils/utils.py", line 91, in load_pretrain ms.load_param_into_net(network, param_dict) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 703, in load_param_into_net _load_dismatch_prefix_params(net, parameter_dict, param_not_load, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 742, in _load_dismatch_prefix_params _update_param(param, new_param, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 118, in _update_param raise RuntimeError(msg) RuntimeError: For 'load_param_into_net', model.model.77.m.0.weight in the argument 'net' should have the same shape as model.model.77.m.0.weight in the argument 'parameter_dict'. But got its shape (24, 128, 1, 1) in the argument 'net' and shape (255, 128, 1, 1) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same.

yolov5n： [CRITICAL] ME(49608:281473261058624,MainProcess):2024-03-02-14:53:49.428.28 [mindspore/train/serialization.py:112] Failed to combine the net and the parameters for param model.model.24.m.0.weight. Traceback (most recent call last): File "train.py", line 290, in train(args) File "train.py", line 128, in train load_pretrain(network, args.weight, ema, args.ema_weight) # load pretrain File "/home/ma-user/work/mindyolo-r0.1/mindyolo/utils/utils.py", line 91, in load_pretrain ms.load_param_into_net(network, param_dict) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 703, in load_param_into_net _load_dismatch_prefix_params(net, parameter_dict, param_not_load, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 742, in _load_dismatch_prefix_params _update_param(param, new_param, strict_load) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/serialization.py", line 118, in _update_param raise RuntimeError(msg) RuntimeError: For 'load_param_into_net', model.model.24.m.0.weight in the argument 'net' should have the same shape as model.model.24.m.0.weight in the argument 'parameter_dict'. But got its shape (24, 64, 1, 1) in the argument 'net' and shape (255, 64, 1, 1) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same.

4.在不启用预训练模型模型情况下，可执行训练程序： 2024-03-02 15:21:02,162 [INFO] Epoch 6/300, Step 39/39, step time: 1896.07 ms 2024-03-02 15:21:02,871 [INFO] Saving model to ./runs/2024.03.02-15:10:50/weights/yolov5n_shwd-6_39.ckpt 2024-03-02 15:21:02,872 [INFO] Epoch 6/300, epoch time: 1.24 min. 2024-03-02 15:22:16,137 [WARNING] overflow, still update, loss scale adjust to 1024.0 2024-03-02 15:22:16,147 [INFO] Epoch 7/300, Step 39/39, imgsize (640, 640), loss: 0.2346, lbox: 0.0723, lobj: 0.0548, lcls: 0.1075, cur_lr: 0.0009768999880179763 2024-03-02 15:22:16,149 [INFO] Epoch 7/300, Step 39/39, step time: 1878.87 ms 2024-03-02 15:22:16,761 [INFO] Saving model to ./runs/2024.03.02-15:10:50/weights/yolov5n_shwd-7_39.ckpt 2024-03-02 15:22:16,762 [INFO] Epoch 7/300, epoch time: 1.23 min. 2024-03-02 15:23:31,967 [WARNING] overflow, still update, loss scale adjust to 1024.0 2024-03-02 15:23:31,977 [INFO] Epoch 8/300, Step 39/39, imgsize (640, 640), loss: 0.2195, lbox: 0.0681, lobj: 0.0481, lcls: 0.1034, cur_lr: 0.0009736000210978091 2024-03-02 15:23:31,979 [INFO] Epoch 8/300, Step 39/39, step time: 1928.60 ms 2024-03-02 15:23:32,630 [INFO] Saving model to ./runs/2024.03.02-15:10:50/weights/yolov5n_shwd-8_39.ckpt 2024-03-02 15:23:32,631 [INFO] Epoch 8/300, epoch time: 1.26 min.

请问老师如何解决预训练模型无法载入问题？

Mar 02 '24 07:03 Living190711

yolov7-tiny： RuntimeError: For 'load_param_into_net', model.model.77.m.0.weight in the argument 'net' should have the same shape as model.model.77.m.0.weight in the argument 'parameter_dict'. But got its shape (24, 128, 1, 1) in the argument 'net' and shape (255, 128, 1, 1) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same.

我这边训练的类别数是3(3*(3+5)=24)个，预训练模型类别是80(3*(80+5)=255)。导致了shape不一致，但是在训练过程中我修改了配置，是可以丢掉最后一层的权重（shape）。 b989df809d6635a075d08cdd692cd43

但是出现了错误，这种情况下应该如何改进呢？

Mar 02 '24 08:03 Living190711

看报错应该是模型结构和权重shape不一致可能是修改了最后层分类数导致的

1.环境配置：modelart（mindspore_1.10.0-cann_6.0.1-py_3.7-euler_2.8.3镜像），EulerOS 2.0 (SP8), CANN-6.0.1，mindspore1.10， mindyolo r0.1。 2.数据集制作及训练过程文档：https://github.com/mindspore-lab/mindyolo/blob/master/examples/finetune_SHWD/README.md 3.训练过程中出现报错： RuntimeError: For 'load_param_into_net', model.model.77.m.0.weight in the argument 'net' should have the same shape as model.model.77.m.0.weight in the argument 'parameter_dict'. But got its shape (21, 128, 1, 1) in the argument 'net' and shape (255, 128, 1, 1) in the argument 'parameter_dict'.May you need to check whether the checkpoint you loaded is correct or the batch size and so on in the 'net' and 'parameter_dict' are same. 请问如何让解决。 [注：日志文件见附件。 outputlog.txt

Mar 12 '24 06:03 zhanghuiyao

权重加载的逻辑是在这个地方进行的可以尝试在这个函数调试下看看 https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/utils.py#L113

Mar 12 '24 06:03 zhanghuiyao

mindyolo mindyolo copied to clipboard

使用教程提供数据集训练yolov7模型加载出现问题。

mindyolo
mindyolo copied to clipboard