tesseractMRZ
tesseractMRZ copied to clipboard
Ready-to-use MRZ / MRTD (Machine-readable zone/travel documents) dataset and models for tesseract v4
!!New!!
SDK to detect and recognize MRZ/MRTD released at https://github.com/DoubangoTelecom/ultimateMRZ-SDK
If you're looking for information on how to parse or validate MRZ data check here and here.
The dataset
The dataset contains more than #7 thousands images (.tif) with ground truth (.gt.txt) from Google image augmented with few synthetic data.
The dataset is ready to be used to train with Tesseract v4.
The models
If you're lazy and don't want to train the model by yourself then, try the ones under tessdata_best (float-model) or tessdata_fast (int-model) folders.
Testing the accuracy
You can check how accurate the MRZ model is at https://www.doubango.org/webapps/mrz/
You may also be interested in our Magnetic ink character recognition (MICR E-13B & CMC-7) implementation at https://github.com/DoubangoTelecom/tesseractMICR with online demo at https://www.doubango.org/webapps/micr/
Getting help
To get help please check our discussion group or twitter account