PaddleOCR
PaddleOCR copied to clipboard
Smoothing color images and black and white text images for OCR
I have implemented OCR to recognize numbers in documents and later I will hide the number like National Security Number/Resident registration number. The folder contains different types of color and black white images (passport, business documents).
The problem is that sometimes OCR could not detect the number and skip the number like in image "Resident Registration Number". OCR skip the "Resident Registration Number" while it is clear.
How to solve this problem?
Code
import glob
import re
import string
from itertools import chain
# import self as self
from paddleocr import PaddleOCR, draw_ocr, PPStructure
import cv2
import os
import regex
pattern_8 = r'-\d{8}'
pattern_4 = r'-\d{4}'
my_path = "/media/cvpr/CM_1/COREMAX/testing/"
ocr = PaddleOCR(rec=True, use_angle=True, lang='korean', use_gpu=True)
face_cascade = cv2.CascadeClassifier('/media/cvpr/CM_1/pytesseract/haarcascade_frontalface_default.xml')
save_path = '/media/cvpr/CM_1/COREMAX/paddle/'
# Regex
passport_pattern = '^[A-Z0-9<]{9}[0-9]{1}[A-Z]{3}[0-9]{7}[A-Z]{1}[0-9]{7}[A-Z0-9<]{14}[0-9]{2}$'
for img in glob.glob(my_path + '*.*'):
img_bgr_rgb = cv2.imread(img)
file_Name = os.path.basename(img)
#image = img_bgr_rgb[:, :, ::-1]
# Not Good results
#thresh, im_bw = cv2.threshold(image, 210, 230, cv2.THRESH_BINARY)
#cv2.imwrite("bw_image.jpg", im_bw)
face_data = face_cascade.detectMultiScale(img_bgr_rgb, 1.3, 5)
for (x, y, w, h) in face_data:
roi = img_bgr_rgb[y:y + h, x:x + w]
roi = cv2.GaussianBlur(roi, (25, 25), cv2.BORDER_ISOLATED)
img_bgr_rgb[y:y + roi.shape[0], x:x + roi.shape[1]] = roi
result = ocr.ocr(img_bgr_rgb, cls=True)
for x in result:
if regex.search(r'\.[0-9.]+', str(x[1][0])):
#print(x[1][0])
x1 = int(x[0][0][0])
y1 = int(x[0][0][1])
x2 = int(x[0][2][0])
y2 = int(x[0][2][1])
cv2.rectangle(img_bgr_rgb, (x1, y1), (x2, y2), (255, 255, 224), cv2.FILLED)
cv2.imwrite(os.path.join(save_path, file_Name), img_bgr_rgb)
elif '-' in str(x[1][0]):
print(x[1][0])
x1 = int(x[0][0][0])
y1 = int(x[0][0][1])
x2 = int(x[0][2][0])
y2 = int(x[0][2][1])
cv2.rectangle(img_bgr_rgb, (x1, y1), (x2, y2), (255, 255, 224), cv2.FILLED)
cv2.imwrite(os.path.join(save_path, file_Name), img_bgr_rgb)
Output of OCR
5318-864-2206-293
21i-87-50168
서울득벌시 강납구 논현로 149 길 67-7
57-7 Noihyeou-r0149-Bi
Gangnam-gu Seoull Korea
니1sd O -ilg s- [iSlticl 7a8 0016
Output Image
Output Image
The reason why the numbers are not obtained is that the text recognition model of PaddleOCR has a poor recognition effect on numbers.
Current text detection result:
Current text recognition result for Resident Registration Number. We can easily find that the recognition effect of Resident Registration Number is not good.
Of course, it is also possible to alleviate the problem of recognition errors by adjusting some parameters. In the case you provided, by setting use_dilation=True to expand the text detection frame, the Resident Registration Number can be recognized.
The method for set use_dilatioin as True.
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(lang='korean', use_gpu=True, use_angle=True, use_dilation=True)
However, the model for recognizing Korean is trained using synthetic data. Without real data to participate in the training, the recognition effect is not good. It is recommended to label Korean data to retrain the Korean recognition model.
What is your paddleOCR version? I have checked your code but it didnt work for me maybe there is version problem.
paddleocr 2.5.0.3
the code
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(lang='korean', use_gpu=True, use_angle=True, use_dilation=True) # need to run only once to download and load model into memory
img_path = './182526678-562dcba8-af71-4e40-9f69-7f44d9848c9f.png'
result = ocr.ocr(img_path, cls=False)
for line in result:
print(line)
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/korean.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
I have checked it on 2.5.0 but it did not work for me, did you check on my code?