ruby-tesseract-ocr icon indicating copy to clipboard operation
ruby-tesseract-ocr copied to clipboard

Difference in output generated by gem and tesseract command line

Open Meenal-goyal opened this issue 11 years ago • 8 comments

I was trying to extract text from image using tesseract command line but since I wanted to use ruby script I tried your gem. Now, the problem is I am getting different output by gem. Also in some cases gem is not performing at par and giving bad output. Is there any version difference? Additional info:

$ tesseract -v tesseract 3.02.02 leptonica-1.69 libjpeg 8d : libpng 1.6.12 : zlib 1.2.5

What version is gem using?

Meenal-goyal avatar Jul 11 '14 06:07 Meenal-goyal

The gem uses the version installed on the system.

meh avatar Jul 11 '14 12:07 meh

Then what's the reason of getting different output? Is it possible that may be gem uses the older version of tesseract installed on system instead of the new version? I have got only latest version on my system but may be it has support for older versions as well.

Meenal-goyal avatar Jul 11 '14 17:07 Meenal-goyal

No, that's not how it works. The only possible reason is different default options between the binary and the library.

meh avatar Jul 11 '14 18:07 meh

So, how can i change these options for the binary? Also I wanted to set extra configuration variables like matcher_good_threshold etc. what option should i give in the ruby script?

Meenal-goyal avatar Jul 13 '14 05:07 Meenal-goyal

Was there ever an answer for this question? I'm having the same problem. This may not be the right place to ask, but how can I see the default configuration being used by the binary so I can pass that configuration into the gem?

cwulfman avatar May 11 '15 15:05 cwulfman

I honestly don't know, someone should have to dig around the binary's source code to figure out what differing default options are there.

meh avatar May 11 '15 19:05 meh

Ok; thank you.

On May 11, 2015, at 15:21, meh. [email protected] wrote:

I honestly don't know, someone should have to dig around the binary's source code to figure out what differing default options are there.

— Reply to this email directly or view it on GitHub https://github.com/meh/ruby-tesseract-ocr/issues/40#issuecomment-101023666.

cwulfman avatar May 11 '15 19:05 cwulfman

@meh, FYI, the default psm mode for tesseract command line is '3', while for libtesseract it's '6'.

amitdo avatar Feb 18 '16 13:02 amitdo