rtesseract
rtesseract copied to clipboard
Handling timeouts
Hey there, I need to implement timeout for a long running Tesseract command.
I came up with two options how to do it:
- Add the timeout option to the
RTesseract.newand reimplement theCommand#runusing theOpen3.popen3instead ofOpen3.capture3and catch the timout there (if set) - Add some
asyncoption to theRTesseract.newand implement somerun_asyncandresultsmethods, also usingOpen3.popen3, which would return PID therefore the timeout (killing the process) can be handled in the client code.
What do you think? Should I try to open a PR? Thanks!
Just posting a workaround until this moves on, if anyone finds it useful.
Simply create a shell script wrapper around tesseract command:
#!/usr/bin/env sh
timeout 10s tesseract "$@"
And then use it when calling RTesseract:
RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s
Unfortunately you cannot tell whether it timed out or it crashed:
begin
RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s
rescue RTesseract::Error => e
raise e unless e.message.include?("Terminated")
raise "Tesseract probably timed out"
end
So it would be still much better to handle it directly in the gem as proposed above.
Hey @krystof-k. Why do you need a timeout? Reason I'm asking is that I'm running a RTesseract command in a job. The job eventually stalls while demanding +100% CPU. I was wondering if you were experiencing something similar?
Hey @danielfriis, well, maybe yes – I needed to avoid a long-running jobs because of limited resources, which would block execution of next jobs in the queue. I haven't got deep enough whether it just took a long time or hanged completely yet.
@krystof-k sounds like the same issue here. I'll let you know if I learn more
@krystof-k When I use your timeout script, I get this error: tesseract_with_timeout.sh: line 3: timeout: command not found. Do you know why?
What system are you running it on? On macOS there is no built-in timeout command, you would need to install coreutils (brew install coreutils).
FYI. I found that the timeout script caused tesseract to close prematurely when I was looping through about 30 pages.