PassportEye
PassportEye copied to clipboard
Better results with lang 'OCRB'
I have played around a bit with this tool and got much better results with lang 'OCRB' which can be fetched from here: https://github.com/Exteris/tesseract-mrz/tree/master/lang
After adding it to tesseract data dir, I have added to the following line -l OCRB
: https://github.com/konstantint/PassportEye/blob/master/passporteye/util/ocr.py#L30
Just wanted to let you know - you probably want to include this by default into this tool or add a hint to the README.
Thanks. I long wanted to try an OCRA/B font-specific recognizer but never found the time to train one. It's natural that it should work better.
I am getting mixed results. I see an average of 40% speedup over a substantial increase in correctness.
However it seems to be more prone to hallucinating 3 | S | Z
characters instead of he <
filler characters. The current recognizer on the other hand seems to have trouble with H vs M
, so I still would consider it an improvement since the filler characters are easier to eliminate using some heuristics.
Also there are obviously some (different) outliers as some data dropped from valid=100 to completely invalid. Unfortunately this is production data, so cannot post.
We are also changing the -l
parameter (in our case, we use eng+spa+OCRB
). Since this seems a common need, maybe it should be an extra parameter so the user can choose what to use there.
Hello. It would be good to add the option to add extra parameters to the ocr.py
Makes sense, indeed. I'll try not to forget to add it by tomorrow (or feel free to PR - it's an easy option for a hacktoberfest contribution after all, DO still has those t-shirts, right?)
Has somebody already tried using this new traineddata? https://github.com/Shreeshrii/tessdata_ocrb
@uwolfer I have tried the above mentioned data but getting poor result than before. Earlier read_mrz was providing more accurate result. Only with this is it provides data which is false many a times
How to try .traineddata file with passportEye, I am completely lost, want to read mrz region. I am newbie with tesseract ocr. Please help, some code snipped would work like charm. I am using mac. This is my current code, I have downloaded .traineddata file and kept it in tessdata folder along with default .traineddata files. How to use it in this code :
import os
from passporteye import read_mrz
pr_path = os.getcwd()
file_path = os.path.join(pr_path,'my_app', 'data')
mrz = read_mrz(file_path + '/test1.jpg')
print(mrz)
Hi @kmanadkat
let's say if you have downloaded OCRB pretrained data you just need to specify it while reading mrz file like
mrz = read_mrz('abu_2.jpg',extra_cmdline_params='-l OCRB') -##### you can change OCRB to your pretrained data,
Also Make sure you have the file in tesseract folder tessdata folder.
@hunaidkhan2000 Thanks for helping out with, but I started getting bad result with this. I am using pytesseract directly and extracting mrz info with it, in that case I am getting good accuracy. Something like this :
text = pytesseract.image_to_string(img_path, lang="OCRB")
PassportEye needs lot of improvement. PS @kmanadkat is my another git account ;)
@krupeshxebia yes i commented the same thing the results get poor if i use OCRB , The data needs to be trained or tessdata needs to get updated by google to provide us better results.
I have played around a bit with this tool and got much better results with lang 'OCRB' which can be fetched from here: https://github.com/Exteris/tesseract-mrz/tree/master/lang
After adding it to tesseract data dir, I have added to the following line
-l OCRB
: https://github.com/konstantint/PassportEye/blob/master/passporteye/util/ocr.py#L30Just wanted to let you know - you probably want to include this by default into this tool or add a hint to the README.
I am struggling to do this - can you provide more explicit instructions on how to extract this to the tesseract folder?