PaddleOCR
PaddleOCR copied to clipboard
Low Validation Accuracy for PPOCRv4 if Image resolution is changed
问题描述 / Problem Description
I've been going through a problem since a month where (Specially for PPOCRv4) if I change the Image Resolution to [3,32,150] instead of default [3,48,320]. Something is off because PPOCRv4 introduces MultiScaleDataset for training in config and SimpleDataset for evaluation, whereas it was only SimpleDataset for both train and eval in config of PPOCRv3. The main problem is if you change Image resolution as what I said above to [3,32,150] in config of v4, the train accuracy is high around 95% while evaluation on that gives very poor results and this is not even a case of overfitting as I tried training on train+eval data which gave in logs around 98% accuracy for training while evaluation on eval dataset is in 40%, I even tried changing eval and train dataloaders to both MultiScaleDataset and SimpleDataset but it didn't help, something is there I cannot understand as how training on some data shows 98% accuracy and above and evaluation on the same small part of that data gives accuracies in 40%. Same happens for train data, during training logs it shows accuracy of 98% and when i evaluate same training data on best_accuracy.pdparams it gives off accuracy 58% which was for the evaluation, what could be the reason for this discrepancy ?
运行环境 / Runtime Environment
- PaddleOCR: Problem is here in PPOCRv4
复现代码 / Reproduction Code
My config is - `Global: debug: true use_gpu: true epoch_num: 500
log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/finally_v4 save_epoch_step: 100 eval_batch_step:
- 0
- 1000 cal_metric_during_train: true pretrained_model: null checkpoints: null save_inference_dir: null use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: datasets/custom_dict.txt max_text_length: 25 infer_mode: false use_space_char: false distributed: true save_res_path: ./output/rec/omlette_ppocrv4.txt use_wandb: false
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0005 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: rec algorithm: SVTR_LCNet Transform: null Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 120 depth: 2 hidden_dims: 120 kernel_size: - 1 - 3 use_guide: true Head: fc_decay: 1.0e-05 - NRTRHead: nrtr_dim: 384 max_text_length: 25 Loss: name: MultiLoss loss_config_list:
- CTCLoss: null
- NRTRLoss: null
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: false
Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./
ext_op_transform_idx: 1
label_file_list:
- datasets/combined_train_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape:
- 32
- 150
- 3 max_text_length: 25
- RecAug: null
- MultiLabelEncode: gtc_encode: NRTRLabelEncode
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio sampler: name: MultiScaleSampler scales:
-
- 320
- 32
-
- 320
- 48
-
- 320
- 64 first_bs: 96 fix_bs: false divided_factor:
- 8
- 16 is_training: true loader: shuffle: true batch_size_per_card: 128 drop_last: true num_workers: 8 Eval: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./ label_file_list:
- ./datasets/manik_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- MultiLabelEncode: gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape:
- 3
- 32
- 150
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio sampler: name: MultiScaleSampler scales:
- [150, 32] # Use a single scale for evaluation first_bs: 128 fix_bs: true divided_factor: [8, 16] is_training: false loader: shuffle: false drop_last: false batch_size_per_card: 128 num_workers: 4 profiler_options: null`
my train data is T1+V1 and my eval data is V1
完整报错 / Complete Error Message
`ppocr INFO: epoch: [199/500], global_step: 71970, lr: 0.000337, acc: 0.950521, norm_edit_dis: 0.993425, CTCLoss: 0.303188, NRTRLoss: 0.711274, loss: 1.013391, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78093 s, avg_samples: 84.8, ips: 108.58877 samples/s, eta: 1 day, 4:10:31, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:06] ppocr INFO: epoch: [199/500], global_step: 71980, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993425, CTCLoss: 0.268923, NRTRLoss: 0.712596, loss: 0.986175, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.79516 s, avg_samples: 64.0, ips: 80.48709 samples/s, eta: 1 day, 4:10:19, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:14] ppocr INFO: epoch: [199/500], global_step: 71990, lr: 0.000337, acc: 0.958333, norm_edit_dis: 0.992793, CTCLoss: 0.238311, NRTRLoss: 0.712596, loss: 0.950907, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.79268 s, avg_samples: 75.2, ips: 94.86814 samples/s, eta: 1 day, 4:10:08, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:22] ppocr INFO: epoch: [199/500], global_step: 72000, lr: 0.000337, acc: 0.942708, norm_edit_dis: 0.992840, CTCLoss: 0.244233, NRTRLoss: 0.708435, loss: 0.952667, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.77737 s, avg_samples: 78.4, ips: 100.85287 samples/s, eta: 1 day, 4:09:57, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB
eval model:: 0%| | 0/34 [00:00<?, ?it/s] eval model:: 3%|▎ | 1/34 [00:00<00:10, 3.18it/s] eval model:: 9%|▉ | 3/34 [00:00<00:03, 8.38it/s] eval model:: 15%|█▍ | 5/34 [00:00<00:02, 11.93it/s] eval model:: 21%|██ | 7/34 [00:00<00:01, 14.33it/s] eval model:: 26%|██▋ | 9/34 [00:00<00:01, 16.04it/s] eval model:: 35%|███▌ | 12/34 [00:00<00:01, 17.63it/s] eval model:: 44%|████▍ | 15/34 [00:01<00:01, 18.51it/s] eval model:: 53%|█████▎ | 18/34 [00:01<00:00, 19.07it/s] eval model:: 62%|██████▏ | 21/34 [00:01<00:00, 19.44it/s] eval model:: 71%|███████ | 24/34 [00:01<00:00, 19.67it/s] eval model:: 76%|███████▋ | 26/34 [00:01<00:00, 19.24it/s] eval model:: 82%|████████▏ | 28/34 [00:01<00:00, 18.40it/s] eval model:: 88%|████████▊ | 30/34 [00:01<00:00, 12.62it/s] eval model:: 94%|█████████▍| 32/34 [00:02<00:00, 13.43it/s] eval model:: 100%|██████████| 34/34 [00:02<00:00, 13.86it/s] eval model:: 100%|██████████| 34/34 [00:04<00:00, 8.21it/s] [2024/07/03 05:54:26] ppocr INFO: cur metric, acc: 0.48791821447974393, norm_edit_dis: 0.9153030167021731, fps: 2919.5675084361587 [2024/07/03 05:54:26] ppocr INFO: best metric, acc: 0.5864312254032732, is_float16: False, norm_edit_dis: 0.9353985370995163, fps: 2784.2707554228186, best_epoch: 188 [2024/07/03 05:54:34] ppocr INFO: epoch: [199/500], global_step: 72010, lr: 0.000337, acc: 0.955729, norm_edit_dis: 0.994558, CTCLoss: 0.214817, NRTRLoss: 0.706834, loss: 0.922900, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.78858 s, avg_samples: 72.0, ips: 91.30309 samples/s, eta: 1 day, 4:09:45, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:42] ppocr INFO: epoch: [199/500], global_step: 72020, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993750, CTCLoss: 0.224658, NRTRLoss: 0.709030, loss: 0.934898, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78265 s, avg_samples: 67.2, ips: 85.86192 samples/s, eta: 1 day, 4:09:34, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB `
可能解决方案 / Possible solutions
Please help me around this issue on this repo