Processor: replace loky with pebble to enforce worker timeouts
_page_worker: removeThreadPoolmechanism introduced in 3cc47804 (which broke processors that are not threadsafe like TF/Keras)- since no mechanisms work to stop computation in uniprocessing (as not even
_thread.interrupt_main()orsignal.alarm()would interrupt I/O or C library calls like libtesseract): drop - since neither stdlib's nor loky's ProcessPoolExecutor enforces timeouts on jobs: replace by pebble
- apply
max_secondstimeout iff in ProcessPool mode iff running with METS Server - make
test_run_output_timeoutxfail - add
test_run_output_metsserver_timeout
see https://github.com/OCR-D/ocrd_anybaseocr/pull/115#issuecomment-3641655656 for context (plus internal discussion)
I wonder whether we should still keep some mechanism in the page worker, though – for those cases where our timeout mechanism does work even in uniprocessing. Like interrupting I/O wait or CPU-bound Pythonic computation with signal(), or with _thread.interrupt_main(). But then maybe in the ProcessPool case we would have to avoid these two racing against each other...
If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).
And, regardless, perhaps it would be better to have some actual test cases that cover the pathological case (simulating a long-lasting C library call like libtesseract).
EDIT: BTW, it's the same with KeyboardInterrupt: it works only if in a subprocess, but libtesseract calls are not (!) interruptible. (Perhaps we should take that to tesserocr, though...)
Just quickly on this point:
If we don't do that, we at least still have to update documentation (i.e. TIMEOUT does not apply without METS Server).
What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.
What use case beyond experimenting/developing OCR-D is there for non-METS-server deployment? If timeout and parallelization are relevant factors, users should use processing and METS server.
For simplicity and backwards-compatibility, we still want to support isolated runs of processor CLIs. It would be a shame if v3 envvars do not work there, too.