normcap icon indicating copy to clipboard operation
normcap copied to clipboard

Better recognition of colored text

Open eclipseo opened this issue 2 years ago • 2 comments

Hello,

Would it be difficult to improve the recognition of text with color? I have issue with red and orange text not being recognized, is it a normcap problem or a Tessaract problem?

Thanks.

eclipseo avatar Aug 01 '22 07:08 eclipseo

Hi @eclipseo , thanks for bringing this up!

Currently, NormCap does only some minor pre-processing of the image before feeding it into Tesseract, so yes, Tesseract might not be able to handle that well.

But I have some ideas about additional preprocessing steps, which might help, e.g. contrast stretching.

Could you please provide a sample image, where NormCap currently doesn't recognize the text as expected? That would help me a lot in analyzing how the recognition could be improved.

Thanks!

dynobo avatar Aug 01 '22 15:08 dynobo

Hi @dynobo, I was testing this image with Japanese text: https://m.media-amazon.com/images/I/91Ju9tqeJbS.SL1500.jpg Specifically the text block between "Ingredients" and the yellow box. I found that just using the original image produced varying results each time. However, increasing the contrast just a little bit led to a perfect OCR.

mwolos avatar Aug 24 '22 00:08 mwolos