PaddleOCR Low Validation Accuracy for PPOCRv4 if Image resolution is changed

Low Validation Accuracy for PPOCRv4 if Image resolution is changed

Open ManikSinghSarmaal opened this issue 8 months ago • 1 comments

问题描述 / Problem Description

I've been going through a problem since a month where (Specially for PPOCRv4) if I change the Image Resolution to [3,32,150] instead of default [3,48,320]. Something is off because PPOCRv4 introduces MultiScaleDataset for training in config and SimpleDataset for evaluation, whereas it was only SimpleDataset for both train and eval in config of PPOCRv3. The main problem is if you change Image resolution as what I said above to [3,32,150] in config of v4, the train accuracy is high around 95% while evaluation on that gives very poor results and this is not even a case of overfitting as I tried training on train+eval data which gave in logs around 98% accuracy for training while evaluation on eval dataset is in 40%, I even tried changing eval and train dataloaders to both MultiScaleDataset and SimpleDataset but it didn't help, something is there I cannot understand as how training on some data shows 98% accuracy and above and evaluation on the same small part of that data gives accuracies in 40%. Same happens for train data, during training logs it shows accuracy of 98% and when i evaluate same training data on best_accuracy.pdparams it gives off accuracy 58% which was for the evaluation, what could be the reason for this discrepancy ?

运行环境 / Runtime Environment

PaddleOCR: Problem is here in PPOCRv4

复现代码 / Reproduction Code

My config is - `Global: debug: true use_gpu: true epoch_num: 500

log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/finally_v4 save_epoch_step: 100 eval_batch_step:

0
1000 cal_metric_during_train: true pretrained_model: null checkpoints: null save_inference_dir: null use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: datasets/custom_dict.txt max_text_length: 25 infer_mode: false use_space_char: false distributed: true save_res_path: ./output/rec/omlette_ppocrv4.txt use_wandb: false

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0005 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05 Architecture: model_type: rec algorithm: SVTR_LCNet Transform: null Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list: - CTCHead: Neck: name: svtr dims: 120 depth: 2 hidden_dims: 120 kernel_size: - 1 - 3 use_guide: true Head: fc_decay: 1.0e-05 - NRTRHead: nrtr_dim: 384 max_text_length: 25 Loss: name: MultiLoss loss_config_list:

CTCLoss: null
NRTRLoss: null PostProcess: name: CTCLabelDecode Metric: name: RecMetric main_indicator: acc ignore_space: false Train: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./ ext_op_transform_idx: 1 label_file_list:
- datasets/combined_train_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- RecConAug: prob: 0.5 ext_data_num: 2 image_shape:
  - 32
  - 150
  - 3 max_text_length: 25
- RecAug: null
- MultiLabelEncode: gtc_encode: NRTRLabelEncode
- KeepKeys: keep_keys:
  - image
  - label_ctc
  - label_gtc
  - length
  - valid_ratio sampler: name: MultiScaleSampler scales:
- - 320
  - 32
- - 320
  - 48
- - 320
  - 64 first_bs: 96 fix_bs: false divided_factor:
- 8
- 16 is_training: true loader: shuffle: true batch_size_per_card: 128 drop_last: true num_workers: 8 Eval: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./ label_file_list:
- ./datasets/manik_test.txt transforms:
- DecodeImage: img_mode: BGR channel_first: false
- MultiLabelEncode: gtc_encode: NRTRLabelEncode
- RecResizeImg: image_shape:
  - 3
  - 32
  - 150
- KeepKeys: keep_keys:
  - image
  - label_ctc
  - label_gtc
  - length
  - valid_ratio sampler: name: MultiScaleSampler scales:
  - [150, 32] # Use a single scale for evaluation first_bs: 128 fix_bs: true divided_factor: [8, 16] is_training: false loader: shuffle: false drop_last: false batch_size_per_card: 128 num_workers: 4 profiler_options: null`

my train data is T1+V1 and my eval data is V1

完整报错 / Complete Error Message

`ppocr INFO: epoch: [199/500], global_step: 71970, lr: 0.000337, acc: 0.950521, norm_edit_dis: 0.993425, CTCLoss: 0.303188, NRTRLoss: 0.711274, loss: 1.013391, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78093 s, avg_samples: 84.8, ips: 108.58877 samples/s, eta: 1 day, 4:10:31, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:06] ppocr INFO: epoch: [199/500], global_step: 71980, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993425, CTCLoss: 0.268923, NRTRLoss: 0.712596, loss: 0.986175, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.79516 s, avg_samples: 64.0, ips: 80.48709 samples/s, eta: 1 day, 4:10:19, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:14] ppocr INFO: epoch: [199/500], global_step: 71990, lr: 0.000337, acc: 0.958333, norm_edit_dis: 0.992793, CTCLoss: 0.238311, NRTRLoss: 0.712596, loss: 0.950907, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.79268 s, avg_samples: 75.2, ips: 94.86814 samples/s, eta: 1 day, 4:10:08, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:22] ppocr INFO: epoch: [199/500], global_step: 72000, lr: 0.000337, acc: 0.942708, norm_edit_dis: 0.992840, CTCLoss: 0.244233, NRTRLoss: 0.708435, loss: 0.952667, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.77737 s, avg_samples: 78.4, ips: 100.85287 samples/s, eta: 1 day, 4:09:57, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB

eval model:: 0%| | 0/34 [00:00<?, ?it/s] eval model:: 3%|▎ | 1/34 [00:00<00:10, 3.18it/s] eval model:: 9%|▉ | 3/34 [00:00<00:03, 8.38it/s] eval model:: 15%|█▍ | 5/34 [00:00<00:02, 11.93it/s] eval model:: 21%|██ | 7/34 [00:00<00:01, 14.33it/s] eval model:: 26%|██▋ | 9/34 [00:00<00:01, 16.04it/s] eval model:: 35%|███▌ | 12/34 [00:00<00:01, 17.63it/s] eval model:: 44%|████▍ | 15/34 [00:01<00:01, 18.51it/s] eval model:: 53%|█████▎ | 18/34 [00:01<00:00, 19.07it/s] eval model:: 62%|██████▏ | 21/34 [00:01<00:00, 19.44it/s] eval model:: 71%|███████ | 24/34 [00:01<00:00, 19.67it/s] eval model:: 76%|███████▋ | 26/34 [00:01<00:00, 19.24it/s] eval model:: 82%|████████▏ | 28/34 [00:01<00:00, 18.40it/s] eval model:: 88%|████████▊ | 30/34 [00:01<00:00, 12.62it/s] eval model:: 94%|█████████▍| 32/34 [00:02<00:00, 13.43it/s] eval model:: 100%|██████████| 34/34 [00:02<00:00, 13.86it/s] eval model:: 100%|██████████| 34/34 [00:04<00:00, 8.21it/s] [2024/07/03 05:54:26] ppocr INFO: cur metric, acc: 0.48791821447974393, norm_edit_dis: 0.9153030167021731, fps: 2919.5675084361587 [2024/07/03 05:54:26] ppocr INFO: best metric, acc: 0.5864312254032732, is_float16: False, norm_edit_dis: 0.9353985370995163, fps: 2784.2707554228186, best_epoch: 188 [2024/07/03 05:54:34] ppocr INFO: epoch: [199/500], global_step: 72010, lr: 0.000337, acc: 0.955729, norm_edit_dis: 0.994558, CTCLoss: 0.214817, NRTRLoss: 0.706834, loss: 0.922900, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.78858 s, avg_samples: 72.0, ips: 91.30309 samples/s, eta: 1 day, 4:09:45, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB [2024/07/03 05:54:42] ppocr INFO: epoch: [199/500], global_step: 72020, lr: 0.000337, acc: 0.953125, norm_edit_dis: 0.993750, CTCLoss: 0.224658, NRTRLoss: 0.709030, loss: 0.934898, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.78265 s, avg_samples: 67.2, ips: 85.86192 samples/s, eta: 1 day, 4:09:34, max_mem_reserved: 12328 MB, max_mem_allocated: 12052 MB `

可能解决方案 / Possible solutions

Please help me around this issue on this repo

附件 / Appendix

Jul 03 '24 06:07 ManikSinghSarmaal

PaddleOCR PaddleOCR copied to clipboard

Low Validation Accuracy for PPOCRv4 if Image resolution is changed

问题描述 / Problem Description

运行环境 / Runtime Environment

复现代码 / Reproduction Code

完整报错 / Complete Error Message

可能解决方案 / Possible solutions

附件 / Appendix

PaddleOCR
PaddleOCR copied to clipboard