core icon indicating copy to clipboard operation
core copied to clipboard

Collection of OCR-related python tools and wrappers from @OCR-D

Results 215 core issues
Sort by recently updated
recently updated
newest added

How about setting an optional upper [limit on memory usage](https://docs.python.org/3.9/library/resource.html#resource.RLIMIT_RSS) for processors via another environment variable, say `OCRD_MAX_RSS`? This could also be done externally (e.g. via Docker in [ocrd_all](https://github.com/OCR-D/ocrd_all/issues/280)), but...

From a `workspace validate` I got: ```XML METS has no unique identifier Validation aborted with exception: Traceback (most recent call last): File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py", line 149, in _validate self._validate_mets_files() File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py",...

bug

Somehow the recent changes made `ocrd log` calls take several seconds to complete, rendering usage in scripts like ocrd-import prohibitive.

We now have OCR-D GT under [Github](https://github.com/OCR-D/gt_structure_text/releases) (the old KIT repo has been down for a while, so this is the only place to get the data). It gets created...

this would allow specifying `FROM ocrd/core-cuda-tf1` for all modules depending on Tensorflow 1 – so this (huge!) Docker layer can be **shared** same could be worked out for TF2 and...

for reproducibility, it would be nice to have a checksum and/or file size of models used in XML.

enhancement

The fix was done in a virtual Python environment: pip install isort isort . It orders the import statements and fixes their formatting.

This starts Conda with `deps-conda` as a replacement for Apt with `deps-ubuntu` to install system dependencies. System dependencies should be encapsulated better than via fixed Linux distributions in OCR-D. [Long...

As requested in #1154, this PR introduces a `contentids` attribute for `OcrdFile`, which delegates to `OcrdMets.get_contentids_for_file`, which looks up the `CONTENTIDS` attribute of the `mets:div[@TYPE="page"]` that a file belongs to....