core icon indicating copy to clipboard operation
core copied to clipboard

Processor: replace loky with pebble to enforce worker timeouts

Open bertsky opened this issue 2 weeks ago • 3 comments

  • _page_worker: remove ThreadPool mechanism introduced in 3cc47804 (which broke processors that are not threadsafe like TF/Keras)
  • since no mechanisms work to stop computation in uniprocessing (as not even _thread.interrupt_main() or signal.alarm() would interrupt I/O or C library calls like libtesseract): drop
  • since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble
  • apply max_seconds timeout iff in ProcessPool mode iff running with METS Server
  • make test_run_output_timeout xfail
  • add test_run_output_metsserver_timeout

see https://github.com/OCR-D/ocrd_anybaseocr/pull/115#issuecomment-3641655656 for context (plus internal discussion)

bertsky avatar Dec 12 '25 13:12 bertsky

I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with signal(), or with _thread.interrupt_main(). But then maybe in the ProcessPool case we would have to avoid these two racing against each other...

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract).

EDIT: BTW, it's the same with KeyboardInterrupt: it works only if in a subprocess, but libtesseract calls are not (!) interruptible. (Perhaps we should take that to tesserocr, though...)

bertsky avatar Dec 16 '25 11:12 bertsky

Just quickly on this point:

If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

kba avatar Dec 16 '25 14:12 kba

What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.

For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.

bertsky avatar Dec 17 '25 10:12 bertsky