normcap
normcap copied to clipboard
Better recognition of colored text
Hello,
Would it be difficult to improve the recognition of text with color? I have issue with red and orange text not being recognized, is it a normcap problem or a Tessaract problem?
Thanks.
Hi @eclipseo , thanks for bringing this up!
Currently, NormCap does only some minor pre-processing of the image before feeding it into Tesseract, so yes, Tesseract might not be able to handle that well.
But I have some ideas about additional preprocessing steps, which might help, e.g. contrast stretching.
Could you please provide a sample image, where NormCap currently doesn't recognize the text as expected? That would help me a lot in analyzing how the recognition could be improved.
Thanks!
Hi @dynobo, I was testing this image with Japanese text: https://m.media-amazon.com/images/I/91Ju9tqeJbS.SL1500.jpg Specifically the text block between "Ingredients" and the yellow box. I found that just using the original image produced varying results each time. However, increasing the contrast just a little bit led to a perfect OCR.