PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

it took me forever to launch 1 epoch when using rec_r34_vd_none_bilstm_ctc.yml

Open Yosiiiiiiiiiiiiiiii opened this issue 2 years ago • 3 comments

-it took me forever to launch 1 epoch when using rec_r34_vd_none_bilstm_ctc.yml and en_PP-OCRv3_rec.yml. -it always stuck at During the training process, after the 0th iteration, an evaluation is run every 200 iterations, not showing any error but not start training.
-It traind normally when i use rec_icdar15_train.yml System Environment:google colab

   -o Global.pretrained_model=rec_r34_vd_none_bilstm_ctc_v2.0_train/best_accuracy \
   Global.character_dict_path=ppocr/utils/ic15_dict.txt \
   Global.eval_batch_step=[0,200]

this is my rec_r34_vd_none_bilstm_ctc.yml :

  use_gpu: true
  epoch_num: 72
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec/r34_vd_none_bilstm_ctc/
  save_epoch_step: 3
  # evaluation is run every 2000 iterations
  eval_batch_step: [0, 2000]
  cal_metric_during_train: True
  pretrained_model:
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_words_en/word_10.png
  # for data or label process
  character_dict_path:
  max_text_length: 50
  infer_mode: False
  use_space_char: False
  save_res_path: ./output/rec/predicts_r34_vd_none_bilstm_ctc.txt

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    learning_rate: 0.001
  regularizer:
    name: 'L2'
    factor: 0

Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
  Backbone:
    name: ResNet
    layers: 34
  Neck:
    name: SequenceEncoder
    encoder_type: rnn
    hidden_size: 256
  Head:
    name: CTCHead
    fc_decay: 0

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/train/
    label_file_list: ["./train_data/train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 320]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 256
    drop_last: True
    num_workers: 8

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/test
    label_file_list: ["./train_data/test.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 320]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 256
    num_workers: 4

and it always stuck here: Screen Shot 2565-11-01 at 17 35 59

Yosiiiiiiiiiiiiiiii avatar Nov 01 '22 10:11 Yosiiiiiiiiiiiiiiii

Same problem with 'ch_PP-OCRv3_det_cml.yml' but works fine with 'det_r50_db++_icdar15.yml'

andreybondarb avatar Nov 08 '22 14:11 andreybondarb

try to set num_workers: 0

andyjiang1116 avatar Nov 09 '22 12:11 andyjiang1116

Decreasing batch_size_per_card to 4 and num_workers to 2 solved the issue for me.

andreybondarb avatar Nov 10 '22 07:11 andreybondarb