PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

consistently getting Loss = nanxxx and accuracy 0.000000 even for running on 1500 epoches on arabic dataset.

Open IbrarBabar009 opened this issue 1 year ago • 28 comments

I am trying to train PaddleOCR for arabic dataset for recognition, I am getting

I am training using this command

python -m paddle.distributed.launch --gpus '0' tools/train.py -c configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml

No. of Training Samples: 95998 No. of val Samples: 10428

Here is the sample epoch output

[2023/10/02 10:05:52] ppocr INFO: epoch: [1352/1500], global_step: 2440, lr: 0.000162, acc: 0.000000, norm_edit_dis: 0.000000,CTCLoss: nanxxx, SARLoss: nanxxx, loss: nanxxx, avg_reader_cost: 0.00013 s, avg_batch_cost: 0.30052 s, avg_samples: 128.0,ips: 425.93432 samples/s, eta: 1 day, 8:19:23

this is my arabic_PP-OCRv3_rec.yml

Global:
  debug: false
  use_gpu: true
  epoch_num: 500
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/v3_arabic_mobile
  save_epoch_step: 3
  eval_batch_step: [0, 2000]
  cal_metric_during_train: true
  pretrained_model: /additional_drive/ibrar/PaddleOCR/pretrain_models/arabic/arabic_PP-OCRv3_rec_train/best_accuracy.pdparams
  checkpoints:
  save_inference_dir:
  use_visualdl: false
  infer_img: ./doc/imgs_words/arabic/ar_2.jpg
  character_dict_path: ppocr/utils/dict/arabic_dict.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3_arabic.txt


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.00025
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  algorithm: SVTR
  Transform:
  Backbone:
    name: MobileNetV1Enhance
    scale: 0.5
    last_conv_stride: [1, 2]
    last_pool_type: avg
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 64
            depth: 2
            hidden_dims: 120
            use_guide: True
          Head:
            fc_decay: 0.00001
      - SARHead:
          enc_dim: 512
          max_text_length: *max_text_length

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    - SARLoss:

PostProcess:  
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: False

Train:
  dataset:
    name: SimpleDataSet
    data_dir: /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/train/ #images_arr_updated/
    ext_op_transform_idx: 1
    label_file_list:
    - /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/train/paddle_rec_arr_updated.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
    - RecAug:
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: true
    batch_size_per_card: 128
    drop_last: true
    num_workers: 4
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/val/ 
    label_file_list:
    - /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/val/paddle_rec_arr_updated.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_sar
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 128
    num_workers: 4`

I'm attempting to train the model solely on my Arabic dataset without fine-tuning, and I'm encountering the same issue whether I use a pretrained model and fine-tune it or train it directly on my Arabic dataset.

I have attempted to resolve this issue by extensively searching through PaddleOCR's GitHub issues, and I discovered that the only suggested solution is to increase the number of epochs. Consequently, I increased the number of epochs from 200 to 1500, but unfortunately, I have not been able to resolve the issue.

Is there anyone here who can provide assistance? What I am missing?

IbrarBabar009 avatar Oct 02 '23 10:10 IbrarBabar009