off-nutrition-table-extractor
off-nutrition-table-extractor copied to clipboard
Develop a better image preprocessing algorithm.
Currently, we are using the following filters before sending the images for OCR:
RGB
-> Grayscale
-> GaussianBlur
-> Grayscale
-> RGB
The problem we are facing is that some of the bold text is not been able to detect by OCR. Also, some of the images with non-black backgrounds are undetectable.
You can find the algorithm in the file process.py
file under the function name preprocess_for_ocr
.