mindcv
mindcv copied to clipboard
使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题,且怎么指定预训练模型。
If this is your first time, please read our contributor guidelines: https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md
Describe the bug/ 问题描述 (Mandatory / 必填) 使用vgg16、vgg19在GPU和NPU跑5分类花的数据loss不收敛、精度有问题。
-
Hardware Environment(
Ascend
/GPU
/CPU
) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端: /device ascend/GPU
-
Software Environment / 软件环境 (Mandatory / 必填): -- MindSpore version (e.g., 2.2.11) : -- Python version (e.g., Python 3.9.18) : -- OS platform and distribution (e.g., Linux Ubuntu 22.04): -- GCC/Compiler version (if compiled from source):
-
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式: /mode pynative PYNATIVE_MODE(1) /mode graph
To Reproduce / 重现步骤 (Mandatory / 必填) Steps to reproduce the behavior: 使用yaml文件训练 命令:python train.py --config ./configs/vgg/vgg16_ascend.yaml
Expected behavior / 预期结果 (Mandatory / 必填) A clear and concise description of what you expected to happen.
Screenshots/ 日志 / 截图 (Mandatory / 必填) If applicable, add screenshots to help explain your problem. yaml文件内容
system
mode: 1 distribute: False num_parallel_workers: 8 val_while_train: True
dataset
dataset: 'imagenet' data_dir: './imageNet' shuffle: True dataset_download: False batch_size: 32 drop_remainder: True
augmentation
image_resize: 224 scale: [0.08, 1.0] ratio: [0.75, 1.333] hflip: 0.5 interpolation: 'bilinear' crop_pct: 0.875
model
model: 'vgg16' num_classes: 5 pretrained: True ckpt_path: '' keep_checkpoint_max: 1 ckpt_save_dir: './ckpt3' epoch_size: 20 dataset_sink_mode: True amp_level: 'O0'
loss
loss: 'CE' label_smoothing: 0.1
lr scheduler
scheduler: 'warmup_cosine_decay' lr: 0.01 min_lr: 0.0001 decay_epochs: 198 warmup_epochs: 2
optimizer
opt: 'momentum' momentum: 0.9 weight_decay: 0.00004 loss_scale: 1024 use_nesterov: False
训练结果:
Epoch TrainLoss Top_1_Accuracy Top_5_Accuracy TrainTime EvalTime TotalTime
1 1.659075 25.2044% 100.0000% 22.04 0.99 27.67
2 1.790772 19.0736% 100.0000% 6.21 0.84 10.10
3 1.747301 19.0736% 100.0000% 6.46 0.84 10.10
4 1.628069 19.0736% 100.0000% 6.18 0.78 9.68
5 1.661704 19.0736% 100.0000% 6.33 0.85 10.33
6 1.725484 19.0736% 100.0000% 6.19 0.85 10.06
7 1.674596 18.9373% 100.0000% 6.40 0.89 10.36
8 1.607921 19.0736% 100.0000% 6.25 0.75 10.25
9 1.670359 19.0736% 100.0000% 6.17 0.80 10.14
10 1.685464 19.0736% 100.0000% 6.22 0.87 10.75
11 1.688051 19.0736% 100.0000% 6.41 0.83 10.23
12 1.720397 19.0736% 100.0000% 6.22 0.78 10.54
13 1.750791 19.0736% 100.0000% 6.29 0.79 10.29
14 1.598438 19.0736% 100.0000% 6.18 0.83 9.85
15 1.609399 19.0736% 100.0000% 6.14 0.84 9.81
16 1.617299 19.0736% 100.0000% 6.17 0.95 10.13
17 1.744891 19.0736% 100.0000% 6.23 0.86 10.30
18 1.776682 19.0736% 100.0000% 6.18 0.83 9.81
19 1.670697 19.0736% 100.0000% 6.12 0.93 10.03
20 1.782085 19.0736% 100.0000% 6.36 0.83 10.14
Additional context / 备注 (Optional / 选填) Add any other context about the problem here. loss不收敛,精度也不对。麻烦看一下是什么问题;还有就是我把预训练模型下载下来了怎么进行指定?目前使用pretrained: True会自动下载且在固定位置,想问下怎么进行指定;