core issues

run_processor: apply configurable RSS limit

How about setting an optional upper [limit on memory usage](https://docs.python.org/3.9/library/resource.html#resource.RLIMIT_RSS) for processors via another environment variable, say `OCRD_MAX_RSS`? This could also be done externally (e.g. via Docker in [ocrd_all](https://github.com/OCR-D/ocrd_all/issues/280)), but...

bertsky

workspace validator: non-URI path

7

From a `workspace validate` I got: ```XML METS has no unique identifier Validation aborted with exception: Traceback (most recent call last): File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py", line 149, in _validate self._validate_mets_files() File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py",...

bertsky

bug

ocrd log became really slow

Somehow the recent changes made `ocrd log` calls take several seconds to complete, rendering usage in scripts like ocrd-import prohibitive.

bertsky

bagger creates invalid URL refs

3

We now have OCR-D GT under [Github](https://github.com/OCR-D/gt_structure_text/releases) (the old KIT repo has been down for a while, so this is the only place to get the data). It gets created...

bertsky

add deps-tf1 and docker-cuda-tf1

5

this would allow specifying `FROM ocrd/core-cuda-tf1` for all modules depending on Tensorflow 1 – so this (huge!) Docker layer can be **shared** same could be worked out for TF2 and...

bertsky

checksum and/or file size of models in .PAGE.xml

7

for reproducibility, it would be nice to have a checksum and/or file size of models used in XML.

jbarth-ubhd

enhancement

include_fileGrp/exclude_fileGrp support in METS server

kba

Fix code related to import statements with isort

3

The fix was done in a virtual Python environment: pip install isort isort . It orders the import statements and fixes their formatting.

stweil

deps-conda

3

This starts Conda with `deps-conda` as a replacement for Apt with `deps-ubuntu` to install system dependencies. System dependencies should be encapsulated better than via fixed Linux distributions in OCR-D. [Long...

bertsky

WorkspaceBagger: Use, in order of preference, f.basename, f.contentids and f.ID for filenames

1

As requested in #1154, this PR introduces a `contentids` attribute for `OcrdFile`, which delegates to `OcrdMets.get_contentids_for_file`, which looks up the `CONTENTIDS` attribute of the `mets:div[@TYPE="page"]` that a file belongs to....

kba

core
core copied to clipboard

Metadata

run_processor: apply configurable RSS limit

workspace validator: non-URI path

ocrd log became really slow

bagger creates invalid URL refs

add deps-tf1 and docker-cuda-tf1

checksum and/or file size of models in .PAGE.xml

include_fileGrp/exclude_fileGrp support in METS server

Fix code related to import statements with isort

deps-conda

WorkspaceBagger: Use, in order of preference, f.basename, f.contentids and f.ID for filenames

← Metadata

Owner

Metadata

core core copied to clipboard

Metadata

← Metadata

Owner

Metadata

core
core copied to clipboard