PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

🧪ppocr-v3的识别-自定义数据集训练-验证集准确率比CRNN低

Open weishu20 opened this issue 2 years ago • 7 comments

数据集使用的是SynthText/MJsynth/Synthetic-Chinese-String-Dataset CRNN训练验证集能达到94%,而ppocr-v3只能达到90% config.yml

Global:
  debug: false
  use_gpu: true
  epoch_num: 500
  log_smooth_window: 20
  print_batch_step: 10
  save_epoch_step: 1
  eval_batch_step:
  - 0
  - 2000
  cal_metric_during_train: true
  pretrained_model: null
  checkpoints: null
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: /open-dataset/charset/charset_document_v1.txt
  max_text_length: 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05
Architecture:
  model_type: rec
  algorithm: SVTR
  Transform: null
  Backbone:
    name: MobileNetV1Enhance
    scale: 0.5
    last_conv_stride:
    - 1
    - 2
    last_pool_type: avg
  Head:
    name: MultiHead
    head_list:
    - CTCHead:
        Neck:
          name: svtr
          dims: 64
          depth: 2
          hidden_dims: 120
          use_guide: true
        Head:
          fc_decay: 1.0e-05
    - SARHead:
        enc_dim: 512
        max_text_length: 25
Loss:
  name: MultiLoss
  loss_config_list:
  - CTCLoss: null
  - SARLoss: null
PostProcess:
  name: CTCLabelDecode
Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: false
Train:
  dataset:
    name: LMDBDataSet
    data_dir:
    - /open-dataset/SynthText/lmdb
    - /open-dataset/MJsynth/lmdb
    - /open-dataset/Synthetic-Chinese-String-Dataset/lmdb
    ext_op_transform_idx: 1
    label_file_list:
    - ./train_data/train_list.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape:
        - 48
        - 800
        - 3
    - RecAug: null
    - MultiLabelEncode: null
    - RecResizeImg:
        image_shape:
        - 3
        - 48
        - 800
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 32
    drop_last: true
    num_workers: 4
Eval:
  dataset:
    name: LMDBDataSet
    data_dir:
    - /open-dataset/SynthText/lmdb_eval
    - /open-dataset/MJsynth/lmdb_eval
    - /open-dataset/Synthetic-Chinese-String-Dataset/lmdb_eval
    label_file_list:
    - ./train_data/val_list.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode: null
    - RecResizeImg:
        image_shape:
        - 3
        - 48
        - 800
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 32
    num_workers: 4
profiler_options: null

weishu20 avatar Sep 05 '22 03:09 weishu20

请问字典、 img_shape 、训练超参都是一致的吗?

tink2123 avatar Sep 05 '22 11:09 tink2123

请问字典、 img_shape 、训练超参都是一致的吗?

是的

weishu20 avatar Sep 08 '22 02:09 weishu20

方便的话可以贴一下训练log,我们看一下~

tink2123 avatar Sep 08 '22 03:09 tink2123

方便的话可以贴一下训练log,我们看一下~

log很长,截图了靠后的一部分 截屏2022-09-08 下午3 00 06

weishu20 avatar Sep 08 '22 07:09 weishu20

方便的话可以贴一下训练log,我们看一下~

请问官方做的和crnn对比的实验结果如何?使用的是什么数据集

weishu20 avatar Sep 08 '22 07:09 weishu20

官方使用内部的真实街景中文评测集,在PP-OCRv1 版本时精度就已大幅超越CRNN了。可以对比两个模型在评估集上的识别错误的图片看看,能否找到共性问题。

tink2123 avatar Sep 08 '22 07:09 tink2123

是否crnn的backbone是resnet,我也遇到了同样的问题。ppocrv3的性能要比resnet+rnn的差。

xu-peng-7 avatar Sep 13 '22 00:09 xu-peng-7