PaddleOCR
PaddleOCR copied to clipboard
consistently getting Loss = nanxxx and accuracy 0.000000 even for running on 1500 epoches on arabic dataset.
I am trying to train PaddleOCR for arabic dataset for recognition, I am getting
I am training using this command
python -m paddle.distributed.launch --gpus '0' tools/train.py -c configs/rec/PP-OCRv3/multi_language/arabic_PP-OCRv3_rec.yml
No. of Training Samples: 95998 No. of val Samples: 10428
Here is the sample epoch output
[2023/10/02 10:05:52] ppocr INFO: epoch: [1352/1500], global_step: 2440, lr: 0.000162, acc: 0.000000, norm_edit_dis: 0.000000,CTCLoss: nanxxx, SARLoss: nanxxx, loss: nanxxx, avg_reader_cost: 0.00013 s, avg_batch_cost: 0.30052 s, avg_samples: 128.0,ips: 425.93432 samples/s, eta: 1 day, 8:19:23
this is my arabic_PP-OCRv3_rec.yml
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/v3_arabic_mobile
save_epoch_step: 3
eval_batch_step: [0, 2000]
cal_metric_during_train: true
pretrained_model: /additional_drive/ibrar/PaddleOCR/pretrain_models/arabic/arabic_PP-OCRv3_rec_train/best_accuracy.pdparams
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: ./doc/imgs_words/arabic/ar_2.jpg
character_dict_path: ppocr/utils/dict/arabic_dict.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3_arabic.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.00025
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/train/ #images_arr_updated/
ext_op_transform_idx: 1
label_file_list:
- /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/train/paddle_rec_arr_updated.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 128
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/val/
label_file_list:
- /additional_drive/zain/dataset/raw_data/arabic_docs_combined_caparsoft_Sep22/val/paddle_rec_arr_updated.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 128
num_workers: 4`
I'm attempting to train the model solely on my Arabic dataset without fine-tuning, and I'm encountering the same issue whether I use a pretrained model and fine-tune it or train it directly on my Arabic dataset.
I have attempted to resolve this issue by extensively searching through PaddleOCR's GitHub issues, and I discovered that the only suggested solution is to increase the number of epochs. Consequently, I increased the number of epochs from 200 to 1500, but unfortunately, I have not been able to resolve the issue.
Is there anyone here who can provide assistance? What I am missing?