Robert Sachunsky issues

Results 272 issues of


                                            Robert Sachunsky

regression in OCR-D processor

Since the last update I am getting ``` 09:12:50.405 INFO eynollah - INPUT FILE PHYS_0001 (1/3) 09:12:50.972 INFO eynollah - Resizing and enhancing image... 09:12:50.972 INFO eynollah - Detected 300...

ocr-d

Revert "Merge pull request #97 from qurator-spk/420-namespace-package"

This reverts commit fd56b86acf55677dc7a8bfb9e2737c3cc167327a, reversing changes made to ea792d1e4ac4a722770b82dc91e71f84d5beb212. This is the second attempt, same reasoning as in #107 (creating an upstream ref for ocrd_all), but different target: We want...

meaning of input_binary

The only documentation for this kwarg is in the standalone CLI: > in general, eynollah uses RGB as input but if the input document is strongly dark, bright or for...

documentation

Constrain number of columns

Sometimes one does have prior knowledge about the overall page layout of a document. For example, for historical monographs, there will often be _no columns_, but perhaps a few _tables_...

enhancement

workspace find --download prints None for each file

Currently, running `ocrd workspace find --download` will download files, but print `None` for every file it downloaded (instead of the new `local_filename`). The cause is that `ret_entry` does not get...

Processor.resolve_resource ignores without-extension

AFAICS the ocrd-tool.json's `parameter_usage` is nowhere used (except in a log message), and resolving resources `without-extension` does not work. It looks like instead of `os.path.exists`, in this case one would...

run_processor: apply configurable RSS limit

How about setting an optional upper [limit on memory usage](https://docs.python.org/3.9/library/resource.html#resource.RLIMIT_RSS) for processors via another environment variable, say `OCRD_MAX_RSS`? This could also be done externally (e.g. via Docker in [ocrd_all](https://github.com/OCR-D/ocrd_all/issues/280)), but...

workspace validator: non-URI path

From a `workspace validate` I got: ```XML METS has no unique identifier Validation aborted with exception: Traceback (most recent call last): File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py", line 149, in _validate self._validate_mets_files() File "/data/ocr-d/ocrd_all/venv38/lib/python3.8/site-packages/ocrd_validators/workspace_validator.py",...

bug

ocrd log became really slow

Somehow the recent changes made `ocrd log` calls take several seconds to complete, rendering usage in scripts like ocrd-import prohibitive.

bagger creates invalid URL refs

We now have OCR-D GT under [Github](https://github.com/OCR-D/gt_structure_text/releases) (the old KIT repo has been down for a while, so this is the only place to get the data). It gets created...