PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

wrong result in RFL trained model

Open amm266 opened this issue 10 months ago • 3 comments

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win11 intel 12700 rtx3070

  • 版本号/Version:Paddle: PaddleOCR: 问题相关组件/Related components:paddleocr 2.7.0.2 paddlepaddle 2.5.2 paddlepaddle-gpu 2.5.2

  • 运行指令/Command Code:python tools/infer_rec.py -c train/namesVnew2.yml -o Global.pretrained_model=output/namesV3new/latest Global.infer_img=dataset-kartmelly-eval/img14.png char_dict_path=train/namesV2.txt

  • 完整报错/Complete Error Message: the problem is I am getting the wrong result. i have trained an RFL model from scratch using an example yml file in the repo. the model converges nicely; but at the end, i am getting a completely wrong result (a number) that is not even in the char dict. i tried setting dict char manually and tried othe versions and no luck. i think i am doing an obvious mistake.

yml:

Global:
  debug: false
  use_gpu: true
  epoch_num: 10
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/namesV3new
  save_epoch_step: 25
  eval_batch_step: 50
  cal_metric_during_train: true
  pretrained_model: ./output/namesV3newcp/best_accuracy
  checkpoints: null
  save_inference_dir: ./output/namesV3_inference
  use_visualdl: false
  infer_img: ./doc/imgs_words/arabic/ar_2.jpg
  character_dict_path: ./train/namesV2.txt
  max_text_length: 50
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/names/predicts_ppocrv3.txt
Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.999
  weight_decay: 0.0
  clip_norm_global: 5.0
  lr:
    name: Piecewise
    decay_epochs:
    - 3
    - 5
    - 7
    - 9
    values:
    - 9.0e-05
    - 2.7e-05
    - 8.1e-06
    - 2.43e-06
    - 7.29e-07
Architecture:
  model_type: rec
  algorithm: RFL
  in_channels: 1
  Transform:
    name: TPS
    num_fiducial: 20
    loc_lr: 1.0
    model_name: large
  Backbone:
    name: ResNetRFL
    use_cnt: true
    use_seq: false
  Neck:
    name: RFAdaptor
    use_v2s: false
    use_s2v: false
  Head:
    name: RFLHead
    in_channels: 512
    hidden_size: 256
    batch_max_legnth: 25
    out_channels: 38
    use_cnt: true
    use_seq: false
Loss:
  name: RFLLoss
PostProcess:
  name: RFLLabelDecode
Metric:
  name: CNTMetric
  main_indicator: acc
Train:
  dataset:
    name: SimpleDataSet
    data_dir: C:\Users\amm\PycharmProjects\DataGeneration\
    ext_op_transform_idx: 1
    label_file_list:
    - C:\Users\amm\PycharmProjects\DataGeneration\words-train\dataset.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RFLLabelEncode: null
    - RFLRecResizeImg:
        image_shape:
        - 1
        - 32
        - 100
        padding: false
        interpolation: 2
    - KeepKeys:
        keep_keys:
        - image
        - label
        - length
        - cnt_label
  loader:
    shuffle: true
    batch_size_per_card: 300
    drop_last: true
    num_workers: 8
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: C:\Users\amm\PycharmProjects\DataGeneration\
    label_file_list:
    - C:\Users\amm\PycharmProjects\DataGeneration\words-eval\dataset.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RFLLabelEncode: null
    - RFLRecResizeImg:
        image_shape:
        - 1
        - 32
        - 100
        padding: false
        interpolation: 2
    - KeepKeys:
        keep_keys:
        - image
        - label
        - length
        - cnt_label
  loader:
    shuffle: true
    batch_size_per_card: 300
    drop_last: true
    num_workers: 8
profiler_options: null

cmd:

 python tools/infer_rec.py -c train/namesVnew2.yml -o Global.pretrained_model=output/namesV3new/latest Global.infer_img=dataset-kartmelly-eval/img1.jpg char_dict_path=train/namesV2.txt 
[2024/04/22 16:04:57] ppocr INFO: Architecture : 
[2024/04/22 16:04:57] ppocr INFO:     Backbone :
[2024/04/22 16:04:57] ppocr INFO:         name : ResNetRFL
[2024/04/22 16:04:57] ppocr INFO:         use_cnt : True
[2024/04/22 16:04:57] ppocr INFO:         use_seq : False
[2024/04/22 16:04:57] ppocr INFO:     Head :
[2024/04/22 16:04:57] ppocr INFO:         batch_max_legnth : 25
[2024/04/22 16:04:57] ppocr INFO:         hidden_size : 256
[2024/04/22 16:04:57] ppocr INFO:         in_channels : 512
[2024/04/22 16:04:57] ppocr INFO:         name : RFLHead
[2024/04/22 16:04:57] ppocr INFO:         out_channels : 38
[2024/04/22 16:04:57] ppocr INFO:         use_cnt : True
[2024/04/22 16:04:57] ppocr INFO:         use_seq : False
[2024/04/22 16:04:57] ppocr INFO:     Neck :
[2024/04/22 16:04:57] ppocr INFO:         name : RFAdaptor
[2024/04/22 16:04:57] ppocr INFO:         use_s2v : False
[2024/04/22 16:04:57] ppocr INFO:         use_v2s : False
[2024/04/22 16:04:57] ppocr INFO:     Transform :
[2024/04/22 16:04:57] ppocr INFO:         loc_lr : 1.0
[2024/04/22 16:04:57] ppocr INFO:         model_name : large
[2024/04/22 16:04:57] ppocr INFO:         name : TPS
[2024/04/22 16:04:57] ppocr INFO:         num_fiducial : 20
[2024/04/22 16:04:57] ppocr INFO:     algorithm : RFL
[2024/04/22 16:04:57] ppocr INFO:     in_channels : 1
[2024/04/22 16:04:57] ppocr INFO:     model_type : rec
[2024/04/22 16:04:57] ppocr INFO: Eval :
[2024/04/22 16:04:57] ppocr INFO:     dataset :
[2024/04/22 16:04:57] ppocr INFO:         data_dir : C:\Users\amm\PycharmProjects\DataGeneration\
[2024/04/22 16:04:57] ppocr INFO:         label_file_list : ['C:\\Users\\amm\\PycharmProjects\\DataGeneration\\words-eval\\dataset.txt']
[2024/04/22 16:04:57] ppocr INFO:         name : SimpleDataSet
[2024/04/22 16:04:57] ppocr INFO:         transforms :
[2024/04/22 16:04:57] ppocr INFO:             DecodeImage :
[2024/04/22 16:04:57] ppocr INFO:                 channel_first : False
[2024/04/22 16:04:57] ppocr INFO:                 img_mode : BGR
[2024/04/22 16:04:57] ppocr INFO:             RFLLabelEncode : None
[2024/04/22 16:04:57] ppocr INFO:             RFLRecResizeImg :
[2024/04/22 16:04:57] ppocr INFO:                 image_shape : [1, 32, 100]
[2024/04/22 16:04:57] ppocr INFO:                 interpolation : 2
[2024/04/22 16:04:57] ppocr INFO:                 padding : False
[2024/04/22 16:04:57] ppocr INFO:             KeepKeys :
[2024/04/22 16:04:57] ppocr INFO:                 keep_keys : ['image', 'label', 'length', 'cnt_label']
[2024/04/22 16:04:57] ppocr INFO:     loader :
[2024/04/22 16:04:57] ppocr INFO:         batch_size_per_card : 300
[2024/04/22 16:04:57] ppocr INFO:         drop_last : True
[2024/04/22 16:04:57] ppocr INFO:         num_workers : 8
[2024/04/22 16:04:57] ppocr INFO:         shuffle : True
[2024/04/22 16:04:57] ppocr INFO: Global :
[2024/04/22 16:04:57] ppocr INFO:     cal_metric_during_train : True
[2024/04/22 16:04:57] ppocr INFO:     character_dict_path : ./train/namesV2.txt
[2024/04/22 16:04:57] ppocr INFO:     checkpoints : None
[2024/04/22 16:04:57] ppocr INFO:     debug : False
[2024/04/22 16:04:57] ppocr INFO:     distributed : False
[2024/04/22 16:04:57] ppocr INFO:     epoch_num : 10
[2024/04/22 16:04:57] ppocr INFO:     eval_batch_step : 50
[2024/04/22 16:04:57] ppocr INFO:     infer_img : dataset-kartmelly-eval/img1.jpg
[2024/04/22 16:04:57] ppocr INFO:     infer_mode : True
[2024/04/22 16:04:57] ppocr INFO:     log_smooth_window : 20
[2024/04/22 16:04:57] ppocr INFO:     max_text_length : 50
[2024/04/22 16:04:57] ppocr INFO:     pretrained_model : output/namesV3new/latest
[2024/04/22 16:04:57] ppocr INFO:     print_batch_step : 10
[2024/04/22 16:04:57] ppocr INFO:     save_epoch_step : 25
[2024/04/22 16:04:57] ppocr INFO:     save_inference_dir : ./output/namesV3_inference
[2024/04/22 16:04:57] ppocr INFO:     save_model_dir : ./output/namesV3new
[2024/04/22 16:04:57] ppocr INFO:     save_res_path : ./output/names/predicts_ppocrv3.txt
[2024/04/22 16:04:57] ppocr INFO:     use_gpu : True
[2024/04/22 16:04:57] ppocr INFO:     use_space_char : True
[2024/04/22 16:04:57] ppocr INFO:     use_visualdl : False
[2024/04/22 16:04:57] ppocr INFO: Loss :
[2024/04/22 16:04:57] ppocr INFO:     name : RFLLoss
[2024/04/22 16:04:57] ppocr INFO: Metric :
[2024/04/22 16:04:57] ppocr INFO:     main_indicator : acc
[2024/04/22 16:04:57] ppocr INFO:     name : CNTMetric
[2024/04/22 16:04:57] ppocr INFO: Optimizer :
[2024/04/22 16:04:57] ppocr INFO:     beta1 : 0.9
[2024/04/22 16:04:57] ppocr INFO:     beta2 : 0.999
[2024/04/22 16:04:57] ppocr INFO:     clip_norm_global : 5.0
[2024/04/22 16:04:57] ppocr INFO:     lr :
[2024/04/22 16:04:57] ppocr INFO:         decay_epochs : [3, 5, 7, 9]
[2024/04/22 16:04:57] ppocr INFO:         name : Piecewise
[2024/04/22 16:04:57] ppocr INFO:         values : [9e-05, 2.7e-05, 8.1e-06, 2.43e-06, 7.29e-07]
[2024/04/22 16:04:57] ppocr INFO:     name : AdamW
[2024/04/22 16:04:57] ppocr INFO:     weight_decay : 0.0
[2024/04/22 16:04:57] ppocr INFO: PostProcess :
[2024/04/22 16:04:57] ppocr INFO:     name : RFLLabelDecode
[2024/04/22 16:04:57] ppocr INFO: Train :
[2024/04/22 16:04:57] ppocr INFO:     dataset :
[2024/04/22 16:04:57] ppocr INFO:         data_dir : C:\Users\amm\PycharmProjects\DataGeneration\
[2024/04/22 16:04:57] ppocr INFO:         ext_op_transform_idx : 1
[2024/04/22 16:04:57] ppocr INFO:         label_file_list : ['C:\\Users\\amm\\PycharmProjects\\DataGeneration\\words-train\\dataset.txt']
[2024/04/22 16:04:57] ppocr INFO:         name : SimpleDataSet
[2024/04/22 16:04:57] ppocr INFO:         transforms :
[2024/04/22 16:04:57] ppocr INFO:             DecodeImage :
[2024/04/22 16:04:57] ppocr INFO:                 channel_first : False
[2024/04/22 16:04:57] ppocr INFO:                 img_mode : BGR
[2024/04/22 16:04:57] ppocr INFO:             RFLLabelEncode : None
[2024/04/22 16:04:57] ppocr INFO:             RFLRecResizeImg :
[2024/04/22 16:04:57] ppocr INFO:                 image_shape : [1, 32, 100]
[2024/04/22 16:04:57] ppocr INFO:                 interpolation : 2
[2024/04/22 16:04:57] ppocr INFO:                 padding : False
[2024/04/22 16:04:57] ppocr INFO:             KeepKeys :
[2024/04/22 16:04:57] ppocr INFO:                 keep_keys : ['image', 'label', 'length', 'cnt_label']
[2024/04/22 16:04:57] ppocr INFO:     loader :
[2024/04/22 16:04:57] ppocr INFO:         batch_size_per_card : 300
[2024/04/22 16:04:57] ppocr INFO:         drop_last : True
[2024/04/22 16:04:57] ppocr INFO:         num_workers : 8
[2024/04/22 16:04:57] ppocr INFO:         shuffle : True
[2024/04/22 16:04:57] ppocr INFO: char_dict_path : train/namesV2.txt
[2024/04/22 16:04:57] ppocr INFO: profiler_options : None
[2024/04/22 16:04:57] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
W0422 16:04:57.521317 10844 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.4, Runtime API Version: 11.8
W0422 16:04:57.536936 10844 gpu_resources.cc:149] device: 0, cuDNN Version: 8.6.
[2024/04/22 16:04:57] ppocr INFO: load pretrain successful from output/namesV3new/latest
[2024/04/22 16:04:57] ppocr INFO: infer_img: dataset-kartmelly-eval/img1.jpg
[2024/04/22 16:04:58] ppocr INFO:        result: 16
[2024/04/22 16:04:58] ppocr INFO: success!

dict char: namesV2.txt

amm266 avatar Apr 22 '24 15:04 amm266

What about the contents of the files under the path ./output/names/predicts_ppocrv3.txt?

UserWangZz avatar Apr 23 '24 06:04 UserWangZz

What about the contents of the files under the path ./output/names/predicts_ppocrv3.txt?

the same wrong result.

amm266 avatar Apr 23 '24 11:04 amm266

python tools/infer_rec.py -c train/namesVnew2.yml -o Global.checkpoints=output/namesV3new/latest Global.infer_img=dataset-kartmelly-eval/img14.png char_dict_path=train/namesV2.txt try this command, set checkpoints to your model, not the pretrained_model

UserWangZz avatar Apr 24 '24 01:04 UserWangZz