PaddleOCR
PaddleOCR copied to clipboard
Recognition accuracy won't increase beyond 50%
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [X] I have searched the PaddleOCR Issues and found no similar bug report.
- [X] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
I am trying to train the ch_PP-OCRv4_rec.yml recognition model on a custom dataset, which has very long images, up to 2480 pixels and ~50 height. There are also images with a lot of letters in them that I need to recognize, up to 135 chars per string. When training with the default configuration and max_text_length set to 140, I get an accuracy of 50% and the model never improves any further.
Upon some research, I saw that when max_text_length is increased, I should also increase the image width so that the image doesn't become too blurry. I also turned off any additional transformations like RecAug and RecConAug because they were warping the images and making them unreadable. But the performance does not improve. My config now is as follows:
Global:
debug: false
use_gpu: true
epoch_num: 250
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/waug
save_epoch_step: 10
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: weights/ch/ch_PP-OCRv4_rec_train/student
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 140
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: base_sents
ext_op_transform_idx: 1
label_file_list:
- base_sents/base_sents.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
sampler:
name: MultiScaleSampler
scales: [[640, 32], [640, 48], [640, 64]]
first_bs: &bs 24
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: *bs
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: base_sents
label_file_list:
- base_sents/base_sents.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 2
num_workers: 4
Current epoch step (which won't improve much regardless of how many epochs):
[2024/10/02 12:12:08] ppocr INFO: epoch: [31/250], global_step: 830, lr: 0.000975, acc: 0.416666, norm_edit_dis: 0.437640, CTCLoss: 0.020622, NRTRLoss: 1.209766, loss: 1.234407, avg_reader_cost: 0.00011 s, avg_batch_cost: 0.47500 s, avg_samples: 18.0, ips: 37.89484 samples/s, eta: 0:50:12, max_mem_reserved: 7172 MB, max_mem_allocated: 6961 MB
About the dataset: I have about ~450 original images, each of which I have augmented 100x to get ~45,000 samples. Results don't change whether I use the augmented dataset or the original dataset. Even overfitting on this train data would be satisfactory but the model does not get there either. Please help
🏃♂️ Environment (运行环境)
Ubuntu 22.04
Python 3.10.14
paddleocr 2.8.1
paddlepaddle-gpu 2.6.2
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
python3 tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
I cannot share the dataset.