ocrd_anybaseocr
ocrd_anybaseocr copied to clipboard
OOM in cropper
On a workspace with >500 pages, running the cropper yields a
OSError: [Errno 12] Cannot allocate memory
This happens after VSZ (virtual memory) exceeds 32 GB. In contrast, RSS (resident memory) is still as low as 200 MB.
Could this be a leak in the LSD CPython module, @kba?
Could this be a leak in the LSD CPython module, @kba?
Totally possible. I did not do any work on pylsd beyond getting it to work as a dependency and publishing to PyPI.
The only workaround ATM is to process smaller page ranges. But unless you use numerical page IDs, this will be quite difficult with the OCRD CLI. (The problem being find_files does not support regex search for pageId …)
(The problem being
find_files does not support regex search for pageId…)
see https://github.com/OCR-D/core/issues/855