使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。

Open LegendSun0 opened this issue 4 months ago • 0 comments

If this is your first time, please read our contributor guidelines: https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md

Describe the bug/ 问题描述 (Mandatory / 必填) 使用vgg16、vgg19在GPU和NPU跑5分类花的数据loss不收敛、精度有问题。

Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端: /device ascend/GPU

Software Environment / 软件环境 (Mandatory / 必填): -- MindSpore version (e.g., 2.2.11) : -- Python version (e.g., Python 3.9.18) : -- OS platform and distribution (e.g., Linux Ubuntu 22.04): -- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式: /mode pynative PYNATIVE_MODE(1) /mode graph

To Reproduce / 重现步骤 (Mandatory / 必填) Steps to reproduce the behavior: 使用yaml文件训练命令：python train.py --config ./configs/vgg/vgg16_ascend.yaml

Expected behavior / 预期结果 (Mandatory / 必填) A clear and concise description of what you expected to happen.

Screenshots/ 日志 / 截图 (Mandatory / 必填) If applicable, add screenshots to help explain your problem. yaml文件内容

system

mode: 1 distribute: False num_parallel_workers: 8 val_while_train: True

dataset

dataset: 'imagenet' data_dir: './imageNet' shuffle: True dataset_download: False batch_size: 32 drop_remainder: True

augmentation

image_resize: 224 scale: [0.08, 1.0] ratio: [0.75, 1.333] hflip: 0.5 interpolation: 'bilinear' crop_pct: 0.875

model

model: 'vgg16' num_classes: 5 pretrained: True ckpt_path: '' keep_checkpoint_max: 1 ckpt_save_dir: './ckpt3' epoch_size: 20 dataset_sink_mode: True amp_level: 'O0'

loss

loss: 'CE' label_smoothing: 0.1

lr scheduler

scheduler: 'warmup_cosine_decay' lr: 0.01 min_lr: 0.0001 decay_epochs: 198 warmup_epochs: 2

optimizer

opt: 'momentum' momentum: 0.9 weight_decay: 0.00004 loss_scale: 1024 use_nesterov: False

训练结果： Epoch TrainLoss 1 1.659075 2 1.790772 3 1.747301 4 1.628069 5 1.661704 6 1.725484 7 1.674596 8 1.607921 9 1.670359 10 1.685464 11 1.688051 12 1.720397 13 1.750791 14 1.598438 15 1.609399 16 1.617299 17 1.744891 18 1.776682 19 1.670697 20 1.782085 Top_1_Accuracy Top_5_Accuracy TrainTime EvalTime TotalTime
25.2044% 100.0000% 22.04 0.99 27.67
19.0736% 100.0000% 6.21 0.84 10.10
19.0736% 100.0000% 6.46 0.84 10.10
19.0736% 100.0000% 6.18 0.78 9.68
19.0736% 100.0000% 6.33 0.85 10.33
19.0736% 100.0000% 6.19 0.85 10.06
18.9373% 100.0000% 6.40 0.89 10.36
19.0736% 100.0000% 6.25 0.75 10.25
19.0736% 100.0000% 6.17 0.80 10.14
19.0736% 100.0000% 6.22 0.87 10.75
19.0736% 100.0000% 6.41 0.83 10.23
19.0736% 100.0000% 6.22 0.78 10.54
19.0736% 100.0000% 6.29 0.79 10.29
19.0736% 100.0000% 6.18 0.83 9.85
19.0736% 100.0000% 6.14 0.84 9.81
19.0736% 100.0000% 6.17 0.95 10.13
19.0736% 100.0000% 6.23 0.86 10.30
19.0736% 100.0000% 6.18 0.83 9.81
19.0736% 100.0000% 6.12 0.93 10.03
19.0736% 100.0000% 6.36 0.83 10.14

Additional context / 备注 (Optional / 选填) Add any other context about the problem here. loss不收敛，精度也不对。麻烦看一下是什么问题；还有就是我把预训练模型下载下来了怎么进行指定？目前使用pretrained: True会自动下载且在固定位置，想问下怎么进行指定；

Sep 29 '24 09:09 LegendSun0

mindcv mindcv copied to clipboard

使用vgg16、vgg19跑5分类花的数据loss不收敛、精度有问题，且怎么指定预训练模型。

system

dataset

augmentation

model

loss

lr scheduler

optimizer

mindcv
mindcv copied to clipboard