PaddleOCR arabic recognition model gives bad results for arabic numbers

I am going through the tutorial and testing out some images that include only arabic numbers.

I usually see bad results even though its quite accurate for text.

Can this be resolved using some parameters? or do I need to finetune the recognition model ?

Nov 01 '22 14:11 amroghoneim

I face same problem

Nov 01 '22 15:11 mohamedbakhet

@MissPenguin Please advise.

Nov 02 '22 10:11 amroghoneim

Try finetune training. (The data we have is really limited.)

If it is a bad case, it is recommended to send some samples to analyze.

Nov 04 '22 07:11 LDOUBLEV

@LDOUBLEV can you provide a sample of the Arabic dataset you used in the training

Nov 09 '22 09:11 Mahmuod1

@amroghoneim Could you provide prediction commands and badcase? Let's reproduce the effect.

Nov 30 '22 12:11 tink2123

Sure @tink2123

So this is a sample code

from paddleocr import PaddleOCR,draw_ocr


ocr = PaddleOCR( lang='ar', max_text_length = 50, rec_batch_num = 1) # need to run only once to download and load model into memory
result = ocr.ocr(np.array(img), cls=False, det = False)

example images with numbers only:

By the way I also see degradation when combining numbers with text, and when doing recognition on multiple words compared to splitting them each one on their own.

Nov 30 '22 19:11 amroghoneim

Thanks! This problem is caused by incorrectly processing the Arabic number labels during training, and we will subsequently update the model after retraining. You can also perform finetune training on the Arabic numbers dataset to solve this problem.

Dec 01 '22 09:12 Topdu

@Topdu will this be released anytime soon? as we would really like to use this great OCR solution in our codebase.

Also the finetuning process seems to be hard for me to get my head around when following the documentation. May I ask here what are the steps I need to follow exactly? how should the dataset look like (example dataset url is not working)? is there a training script I can use off the bat? What does the GT label file look like and so on.

Dec 01 '22 10:12 amroghoneim

@Topdu Any comments please?

Dec 04 '22 11:12 amroghoneim

As soon as possible.

For training documents, please refer to rec training doc.

Dec 06 '22 12:12 Topdu

遇到了同样的问题，使用latin_PP-OCRv3_rec模型，想问下是否有更新之后的模型呢？

Jun 16 '23 07:06 AstonyJ

The arabic numbers prediction of SVTR arabic model is poor . any updates on this please ?

Aug 06 '23 18:08 vk1996

PP-OCRv4 arabic model

Aug 08 '23 06:08 Topdu

is the config file same as v3 ?

Aug 08 '23 07:08 vk1996

PP-OCRv4 arabic model

This is an inference model, what about pre-trained model for Arabic?

Aug 21 '23 18:08 Hegelim

Is the dictionary same?

Aug 29 '23 20:08 Hegelim

PaddleOCR PaddleOCR copied to clipboard

arabic recognition model gives bad results for arabic numbers

PaddleOCR
PaddleOCR copied to clipboard