ruby-tesseract-ocr icon indicating copy to clipboard operation
ruby-tesseract-ocr copied to clipboard

Problem with using other languages on OS X with tesseract installed with brew

Open p7r opened this issue 12 years ago • 7 comments

When trying to do tesseract.rb -l ara or when setting up an Engine as follows:

tesseract = Tesseract::Engine.new{|e| 
# Note this fails for multiple values of e.path and for no value at all
    e.path = "/usr/local/Cellar/tesseract/3.02.02/share/"
    e.language = :ara 
  }

I'm getting this:

Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from ./img2txt.rb:14:in `new'
    from ./img2txt.rb:14:in `<main>'

Tesseract itself is installed correctly and using the compiled binary that comes in the package, I am able to load Arabic language files and get OCR output.

Any suggestions gratefully received.

p7r avatar Oct 24 '13 16:10 p7r

Further to this I removed all tesseract libraries on my machine and reinstalled them and the tesseract-ocr gem.

It seems with English it's fine, but it can't find the language files:

$ tesseract.rb 
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:248:in `_setup': you have to set an image first (ArgumentError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:149:in `text_for'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:77:in `block in <top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `tap'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb --help
Usage: tesseract [options]
        --path PATH                  datapath to set
    -l, --language LANGUAGE          language to use
    -m, --mode MODE                  mode to use
    -p, --psm MODE                   page segmentation mode to use
    -u, --unlv                       output in UNLV format
    -c, --confidence                 output the mean confidence of the recognition
    -C, --config PATH...             config files to load
    -b, --blacklist LIST             blacklist the following chars
    -w, --whitelist LIST             whitelist the following chars
    -s, --scale VALUE                scale the image before analyzing it
    -r, --resize VALUE               resize the image before analyzing it

$ tesseract.rb -l ara image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
Pauls-Mac-mini:arabicocrtest paul$ tesseract.rb -l eng --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
V L: _ i _ if __ r
., - 7-; f"::"'=:,  ’
‘HQ.’ .9 9 " x_. ‘
' .' ”- « >3)’   »
'5--4 war; -11-!  2.! u-r‘J:“fi-&“‘->s’9":‘;’,,‘,’ .4» ma

The garbage output is expected the only text in that image is Arabic.

p7r avatar Oct 25 '13 15:10 p7r

I'll have a look very soon (likely toward the end of the weekend).

meh avatar Oct 25 '13 15:10 meh

I'm very sorry I haven't looked into this yet, I've been very busy but I promise I will as soon as I have time.

meh avatar Nov 24 '13 02:11 meh

I have the same problem trying the Nerdz example in this repo. The :lol language is not loaded.

juniorjp avatar Feb 11 '14 05:02 juniorjp

I think this is an OS X specific issue, and I don't have such a machine to fix this problem.

meh avatar Feb 24 '14 03:02 meh

In my case my problem is using Ubuntu. If I change the :lol language( in the Nerdz example) to default :en everything works fine.

And it's the same error "the API did not Init correctly (RuntimeError)"

juniorjp avatar Feb 24 '14 17:02 juniorjp

@juniorjp1989 oh, that's almost good to know then, guess it's a problem with non Arch Linux systems.

meh avatar Feb 24 '14 18:02 meh