PaddleClas icon indicating copy to clipboard operation
PaddleClas copied to clipboard

使用PaddleClas训练一个3分类模型,测试集top1到1,实际使用时精度会差一些,许多没有识别正确

Open yaphet266 opened this issue 1 year ago • 1 comments

使用PaddleClas训练了100轮,导出的最优模型在实际应用中识别率不高。我先前使用这个参数训练了一段时间(没有跑完,终止训练程序),导出了一个最优模型应用识别率比较高,这个比较高的没有跑到100轮。我设置100轮是不是过拟合了,请问我怎么才能设置具体的epochs这个值,能保证训练不过拟合,导出的是最优的模型。 具体环境信息如下:

  1. PaddleClas develop
  2. PaddleInference 2.4.2
  3. 训练环境信息: a. 操作系统Linux ubuntu 20.04 b. Python 3.8 c. CUDA/cuDNN版本, 如CUDA12.0
  4. 训练yaml文件内容 Global: checkpoints: null pretrained_model: null output_dir: ./output/ device: gpu save_interval: 1 eval_during_train: True eval_interval: 1 epochs: 100 print_batch_step: 10 use_visualdl: True image_shape: [3, 32, 32] save_inference_dir: ./inference to_static: False use_dali: False

Arch: name: PPHGNet_small class_num: 3

Loss: Train: - CELoss: weight: 1.0 epsilon: 0.1 Eval: - CELoss: weight: 1.0

Optimizer: name: Momentum momentum: 0.9 lr: name: Cosine learning_rate: 0.5 warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00004

DataLoader: Train: dataset: name: ImageNetDataset image_root: /workspace2/yyc/OCR_data/score_marking/data cls_label_path: /workspace2/yyc/OCR_data/score_marking/label/train.txt transform_ops: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: size: [32, 32] - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: ''

sampler:
  name: DistributedBatchSampler
  batch_size: 128
  drop_last: False
  shuffle: True
loader:
  num_workers: 16
  use_shared_memory: True

Eval: dataset: name: ImageNetDataset image_root: /workspace2/yyc/OCR_data/score_marking/data cls_label_path: /workspace2/yyc/OCR_data/score_marking/label/val.txt transform_ops: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: size: [32, 32] - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' sampler: name: DistributedBatchSampler batch_size: 128 drop_last: False shuffle: False loader: num_workers: 16 use_shared_memory: True

Infer: infer_imgs: /workspace2/yyc/OCR_data/score_marking/test batch_size: 10 transforms: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: size: [32, 32] - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' - ToCHWImage: PostProcess: name: Topk topk: 2 class_id_map_file: ppcls/utils/PULC_label_list/marking_symbols_label_list.txt

Metric: Train: - TopkAcc: topk: [1, 2] Eval: - TopkAcc: topk: [1, 2]

  1. 对如下图片做3分类

yaphet266 avatar Dec 20 '23 07:12 yaphet266

如果说确实评测出来最好的模型但是在实际使用上不是最好的模型,有两种可能。

  1. 你的评测集合和线上使用的数据的集合的分布差异比较大,需要人工地将这个差异缩小;
  2. 模型确实过拟合了,可以尝试以下手段:
  • 减小模型体积,尽量使用小一些的模型;
  • 减少训练轮数
  • 减小学习率
  • 增加wamup等机制
  • 更换更强大的pretrain model,入PaddleClas提供的SSLD的预训练模型

cuicheng01 avatar Jan 18 '24 14:01 cuicheng01