PaddleOCR
PaddleOCR copied to clipboard
🧪ppocr-v3的识别-自定义数据集训练-验证集准确率比CRNN低
数据集使用的是SynthText/MJsynth/Synthetic-Chinese-String-Dataset CRNN训练验证集能达到94%,而ppocr-v3只能达到90% config.yml
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_epoch_step: 1
eval_batch_step:
- 0
- 2000
cal_metric_during_train: true
pretrained_model: null
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: /open-dataset/charset/charset_document_v1.txt
max_text_length: 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform: null
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride:
- 1
- 2
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: true
Head:
fc_decay: 1.0e-05
- SARHead:
enc_dim: 512
max_text_length: 25
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss: null
- SARLoss: null
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: false
Train:
dataset:
name: LMDBDataSet
data_dir:
- /open-dataset/SynthText/lmdb
- /open-dataset/MJsynth/lmdb
- /open-dataset/Synthetic-Chinese-String-Dataset/lmdb
ext_op_transform_idx: 1
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape:
- 48
- 800
- 3
- RecAug: null
- MultiLabelEncode: null
- RecResizeImg:
image_shape:
- 3
- 48
- 800
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 32
drop_last: true
num_workers: 4
Eval:
dataset:
name: LMDBDataSet
data_dir:
- /open-dataset/SynthText/lmdb_eval
- /open-dataset/MJsynth/lmdb_eval
- /open-dataset/Synthetic-Chinese-String-Dataset/lmdb_eval
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode: null
- RecResizeImg:
image_shape:
- 3
- 48
- 800
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 32
num_workers: 4
profiler_options: null
请问字典、 img_shape 、训练超参都是一致的吗?
请问字典、 img_shape 、训练超参都是一致的吗?
是的
方便的话可以贴一下训练log,我们看一下~
方便的话可以贴一下训练log,我们看一下~
log很长,截图了靠后的一部分
方便的话可以贴一下训练log,我们看一下~
请问官方做的和crnn对比的实验结果如何?使用的是什么数据集
官方使用内部的真实街景中文评测集,在PP-OCRv1 版本时精度就已大幅超越CRNN了。可以对比两个模型在评估集上的识别错误的图片看看,能否找到共性问题。
是否crnn的backbone是resnet,我也遇到了同样的问题。ppocrv3的性能要比resnet+rnn的差。