PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Bounding box in the same line break down into smaller ones after finetuning

Open kenho211 opened this issue 2 years ago • 4 comments

Hi everyone, I have used a custom dataset (forms and documents) to finetune on chinese+english detection, using the following:

config: ch_PP-OCRv3_det_student.yml pretrain_model: ./pretrain_models/ch_PP-OCRv3_det_distill_train/student

Using pretrained model, the detected text in the same line of the document are in the same bounding box, while missing quite many text; After finetuning, recall increases but text in the same line are separated into a lot of smaller bounding boxes.

Does anyone experience the same issue?

kenho211 avatar Oct 31 '22 22:10 kenho211

Using pretrained model, the detected text in the same line of the document are in the same bounding box, because the training data are labeled in text-line level. After finetuning, recall increases but text in the same line are separated into a lot of smaller bounding boxes, maybe your training data are labeled in word-level? The annotation format is suggested to be unified to fully take advantage of the pretrained model.

Some tips: before finetuning the model, you can try to adjust the post-processing parameters, which can often boost the performance in form and documents scenes.

MissPenguin avatar Nov 01 '22 07:11 MissPenguin

Yes, I am using a unified annotaion format (labelling all text in the same text line as one bbox). Thank you for the suggestion on modifying post-processing params.

kenho211 avatar Nov 01 '22 08:11 kenho211

I am reading through the tips in documentation (https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/finetune.md)

  1. PP-OCR提供的预训练模型有较好的泛化能力
  2. 加入少量真实数据(检测任务>=500张, 识别任务>=5000张),会大幅提升垂类场景的检测与识别效果
  3. 在模型微调时,加入真实通用场景数据,可以进一步提升模型精度与泛化性能
  4. 在图像检测任务中,增大图像的预测尺度,能够进一步提升较小文字区域的检测效果
  5. 在模型微调时,需要适当调整超参数(学习率,batch size最为重要),以获得更优的微调效果。

For point 2, is 真实数据 referring to scene text data such as those from ICDAR 2015 challenge?

kenho211 avatar Nov 01 '22 19:11 kenho211

  • 在模型微调时,加入真实通用场景数据,可以进一步提升模型精度与泛化性能

真实数据 means data collected from real scene, other than synthesized.

MissPenguin avatar Nov 10 '22 02:11 MissPenguin