rtesseract icon indicating copy to clipboard operation
rtesseract copied to clipboard

RTesseract::ConversionError in Ruby on Rails app

Open jvalentine opened this issue 9 years ago • 17 comments

  • Installed gems fine, tesseract and imagemagik already installed on server.
  • Running tesseract command manually in terminal works successfully.
  • Running application locally on OS X enviroment works successfully.
uploaded_io = params[:picture]
    File.open(Rails.root.join('public', 'uploads', uploaded_io.original_filename), 'wb') do |file|
      file.write(uploaded_io.read)
    end
    dl = RTesseract.new(Rails.root.join('public', 'uploads',uploaded_io.original_filename).to_s)
    @string = dl.to_s

Only once I've deployed to my development server does it all break returning the error. The files are been copied across to the public/uploads folder correctly. And are readable as tested by running the tesseract command outside of ruby on the same file.

The result when running the app is RTesseract::ConversionError on the dl.to_s action

Unsure on what I'm missing..

jvalentine avatar Sep 25 '15 14:09 jvalentine

Bump on this, experiencing the same problem.

nickmeehan avatar Oct 07 '15 07:10 nickmeehan

Yeap,

I try to use the gem in rails. When ocring from console, rails said to me RTesseract::ConversionError: No such file or directory @ rb_sysopen - /tmp/1451631781.39245151432.txt from /.rbenv/versions/2.2.4/lib/ruby/gems/2.2.0/gems/rtesseract-1.3.2/lib/rtesseract.rb:192:in `convert'

can you help me?

Regards,

ustuntas

ustuntas avatar Jan 01 '16 07:01 ustuntas

sudo apt-get install tesseract-ocr @ustuntas it helped me

henb avatar Jan 01 '16 10:01 henb

Hey I already installed the tesseract-ocr but the error is still same.

This is not help me.

ustuntas avatar Jan 03 '16 18:01 ustuntas

@ustuntas You have installed all the prerequisites? Imagemagick (sudo apt-get install libmagickwand-dev imagemagick on Ubuntu) RMagick or mini_magick or quick_magick - Gem

Try run the tesseract on console with a tif image.

dannnylo avatar Jan 03 '16 22:01 dannnylo

Hi dannylo,

I am getting this error as well, and have tesseract and ImageMagick installed(I can use both on my terminal). The below is what I would get in my logs:

RTesseract::ConversionError: No such file or directory @ rb_sysopen - /tmp/1451631781.39245151432.txt

This happens when I call the to_s method. Seems like it is creating the txt file but not saving it? I looked in the /tmp folder and confirmed that it is not there.

jr09g avatar Mar 24 '16 21:03 jr09g

Hi gang—I was able to fix this problem on my machine. It turns out that my installation of tesseract did not include training files. So when rtesseract was invoking the tesseract code, it was silently failing.

ayerie:POETRY simon$ tesseract numbers.png stdout
Error opening data file /opt/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

The solution is to grab a copy of the training data from googlecode, and put it where tesseract is by default looking for it.

ayerie:POETRY simon$ wget https://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
--2016-03-29 10:02:50--  https://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
Resolving tesseract-ocr.googlecode.com (tesseract-ocr.googlecode.com)... 74.125.69.82, 2607:f8b0:4001:c08::52
Connecting to tesseract-ocr.googlecode.com (tesseract-ocr.googlecode.com)|74.125.69.82|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 742852 (725K) [application/x-gzip]
Saving to: ‘eng.traineddata.gz’

eng.traineddata.gz                       100%[==================================================================================>] 725.44K   831KB/s   in 0.9s   

2016-03-29 10:02:51 (831 KB/s) - ‘eng.traineddata.gz’ saved [742852/742852]

ayerie:POETRY simon$ gunzip eng.traineddata.gz 
ayerie:POETRY simon$ sudo mv -v eng.traineddata /opt/local/share/tessdata/
Password:
eng.traineddata -> /opt/local/share/tessdata/eng.traineddata

Once I did that, rtesseract worked just fine. I hope this helps,

Simon

sdedeo avatar Mar 29 '16 14:03 sdedeo

I'm getting the same issue as @jvalentine, and my base installation of Tesseract works already (e.g. tesseract test.jpg stdout), and the training data is in the correct spot. Any updates on this?

kimberli avatar Jul 22 '16 00:07 kimberli

Hello, Do you use Rmagick or Minimagick like a processors? Do you have the imagemagick dev libs installed?

  • Ubuntu systems: sudo apt-get install libmagickwand-dev
  • RHEL systems: yum install ImageMagick-devel
  • Mac: brew install imagemagick

If all prerequisites are working, please send me the error inspected.

dannnylo avatar Jul 22 '16 10:07 dannnylo

Actually, I think when I was installing libmagickwand-dev yesterday it didn't work correctly. Tried again today and it works. Thanks!

kimberli avatar Jul 22 '16 18:07 kimberli

Hello everybody, I was having the same problem using Rails. When I used a rake task to use tesseract it worked perfectly but when using Rails on Apache, for some reason Apache could not find tesseract command in it's path.

The solution was really simple to me. I just added the full path for the command:

RTesseract.new(temp_file, command: "/usr/local/bin/tesseract").to_s.strip

That solved my problem. Hope it can help others.

bacchir avatar Aug 10 '16 10:08 bacchir

Hi all, Had a similar problem to this. My solution was to open the file with MiniMagick before processing it. My file was stored at a URL but this would likely work with a local file too.

def extract_text
    image = MiniMagick::Image.open(self.file_url)
    image = RTesseract.new(image)
    image.to_s
end

cjbutcher avatar Oct 22 '16 13:10 cjbutcher

bump having the same issue. Tried all the solutions above to no avail

test = RTesseract.new(img, :processor => "mini_magick", :lang => "eng", command: "/usr/local/bin/tesseract")
test.to_s

still gives me RTesseract::ConversionError: No such file or directory @ rb_sysopen - /var/folders/ms/ml9k4bbn1bx8d8ccz8lrtq0m0000gp/T/1477343324.75567581123.txt

this is the .png image i'm testing with test

kave avatar Oct 24 '16 21:10 kave

What fixed it for me was to also install it through brew: brew install tesseract

mindingear avatar Apr 19 '17 18:04 mindingear

I think this issue may cause when use irb, irb released the file.

henghuang avatar Sep 01 '17 14:09 henghuang

I have tried evrything but nothing works does any one have solution for this i have the same conversion error

sumanthmadishetty avatar Aug 08 '18 10:08 sumanthmadishetty

On macOS I had the same issue, to resolve it I added the absolute path of the tessdata directory as an option to RTesseract.new.

find your tessdata directory:

find / -type d -name "tessdata" # from cli
`find / -type d -name "tessdata" -print -quit` # in ruby code, find and return first result
image = RTesseract.new(path, {
      :processor => 'mini_magick',
      :tessdata_dir => '/usr/local/Cellar/tesseract/3.05.02/share/tessdata'
    })

dcordz avatar Aug 23 '18 13:08 dcordz

For Heroku 22 Stack I was able to get it working by just needing to add the buildpack;

https://github.com/pathwaysmedical/heroku-buildpack-tesseract

before calling my ruby buildpack.

I was downloading the file temporarily rather than referencing a URL so my job code looked like;

file = record.file.download
file_path = file.path
ocr_text = RTesseract.new(file_path).to_s

louiswdavis avatar Oct 04 '23 13:10 louiswdavis