ocrd_anybaseocr icon indicating copy to clipboard operation
ocrd_anybaseocr copied to clipboard

OOM in cropper

Open bertsky opened this issue 3 years ago • 3 comments

On a workspace with >500 pages, running the cropper yields a

OSError: [Errno 12] Cannot allocate memory

This happens after VSZ (virtual memory) exceeds 32 GB. In contrast, RSS (resident memory) is still as low as 200 MB.

Could this be a leak in the LSD CPython module, @kba?

bertsky avatar May 02 '22 09:05 bertsky

Could this be a leak in the LSD CPython module, @kba?

Totally possible. I did not do any work on pylsd beyond getting it to work as a dependency and publishing to PyPI.

kba avatar May 02 '22 09:05 kba

The only workaround ATM is to process smaller page ranges. But unless you use numerical page IDs, this will be quite difficult with the OCRD CLI. (The problem being find_files does not support regex search for pageId …)

bertsky avatar May 12 '22 09:05 bertsky

(The problem being find_files does not support regex search for pageId …)

see https://github.com/OCR-D/core/issues/855

bertsky avatar May 12 '22 09:05 bertsky