rtesseract icon indicating copy to clipboard operation
rtesseract copied to clipboard

Handling timeouts

Open krystof-k opened this issue 1 year ago • 7 comments

Hey there, I need to implement timeout for a long running Tesseract command.

I came up with two options how to do it:

  1. Add the timeout option to the RTesseract.new and reimplement the Command#run using the Open3.popen3 instead of Open3.capture3 and catch the timout there (if set)
  2. Add some async option to the RTesseract.new and implement some run_async and results methods, also using Open3.popen3, which would return PID therefore the timeout (killing the process) can be handled in the client code.

What do you think? Should I try to open a PR? Thanks!

krystof-k avatar Feb 01 '24 13:02 krystof-k

Just posting a workaround until this moves on, if anyone finds it useful.

Simply create a shell script wrapper around tesseract command:

#!/usr/bin/env sh

timeout 10s tesseract "$@"

And then use it when calling RTesseract:

RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s

Unfortunately you cannot tell whether it timed out or it crashed:

begin
  RTesseract.new("image.jpeg", command: "./tesseract_with_timeout.sh").to_s
rescue RTesseract::Error => e
  raise e unless e.message.include?("Terminated")
  raise "Tesseract probably timed out"
end

So it would be still much better to handle it directly in the gem as proposed above.

krystof-k avatar Feb 19 '24 13:02 krystof-k

Hey @krystof-k. Why do you need a timeout? Reason I'm asking is that I'm running a RTesseract command in a job. The job eventually stalls while demanding +100% CPU. I was wondering if you were experiencing something similar?

danielfriis avatar Jul 18 '24 07:07 danielfriis

Hey @danielfriis, well, maybe yes – I needed to avoid a long-running jobs because of limited resources, which would block execution of next jobs in the queue. I haven't got deep enough whether it just took a long time or hanged completely yet.

krystof-k avatar Jul 18 '24 07:07 krystof-k

@krystof-k sounds like the same issue here. I'll let you know if I learn more

danielfriis avatar Jul 18 '24 07:07 danielfriis

@krystof-k When I use your timeout script, I get this error: tesseract_with_timeout.sh: line 3: timeout: command not found. Do you know why?

danielfriis avatar Jul 18 '24 08:07 danielfriis

What system are you running it on? On macOS there is no built-in timeout command, you would need to install coreutils (brew install coreutils).

krystof-k avatar Jul 18 '24 08:07 krystof-k

FYI. I found that the timeout script caused tesseract to close prematurely when I was looping through about 30 pages.

danielfriis avatar Jul 19 '24 21:07 danielfriis