pyocr icon indicating copy to clipboard operation
pyocr copied to clipboard

preserve_interword_spaces in tesseract

Open anilnaik1988 opened this issue 8 years ago • 1 comments

Hi Team, Currently i am using pyocr with tesseract 3.05.01. I am using pyocr.get_available_tools() to get tesseract. Is there any way i can preserve_interword_spaces for tesseract with help of pyocr.

anilnaik1988 avatar Nov 23 '17 13:11 anilnaik1988

Assuming you're using Tesseract (pyocr.tesseract) and not (pyocr.libtesseract) then yes, you can. You can make your own builder. See DigitBuilder and the other builders for reference. My suggestion: Inherit from TextBuilder and in the constructor, just after calling TextBuilder, set self.tesseract_flags and self.tesseract_configs as you need. Then just pass your new builder to pyocr.tesseract.image_to_string() (aka pyocr.get_avalailable_tools()[0].image_to_string()), and you should get the expected result.

jflesch avatar Nov 23 '17 13:11 jflesch