core
core copied to clipboard
Collection of OCR-related python tools and wrappers from @OCR-D
I know this is already [on the agenda](https://github.com/OCR-D/core/blob/76ea92710a72db278bc38fd7b698ab89b77580d1/ocrd/ocrd/workspace_bagger.py#L66), but it should be more visible IMHO: `ocrd zip bag` could really use an option that allows filtering (negatively) or selecting (positively)...
1. Write an example Nextflow script that uses processes with an exec block that does REST API calls to the outside world. Then parses the received responses. 2. Write an...
There seem to have been upstream changes recently that prevent our usual deployments for Python version 3.6, with `libgeos` (which we need for `shapely` in `ocrd_validators`) being the culprit. See...
The normal convention for new file IDs in OCR-D is (due to `make_file_id` implementation) the pattern `grp + '_' + page`. But the current bulk-add behaviour automatically assigns `file_id` based...
The spec states that `--page-id` is _both_ a multi-value option (i.e. comma-separated) and a range option (i.e. ellipsis allowing). Above that, core also implements the `//` prefix for regex values....
All our processors now add provenance information about the run to the METS via `mets:agent`. This is useful for diagnosis and reconstruction of the workflow later-on. However, the METS actions...
fixes #916
Since #904 we look up … https://github.com/OCR-D/core/blob/71d295ac1fccbeb4164e230bd584e1920b9ab3c8/ocrd/ocrd/processor/base.py#L246-L250 … to determine the moduledir location. But that fails if the processor module itself is distributed via PEP-420 namespace packages (so the top...
A typical output of `ocrd workspace remove` looks like this: ``` today at 14:28:02Oct 4 12:28:01 ocrd-manager 12:28:01.985 WARNING ocrd.workspace.remove_file - File not locally available today at 14:28:02Oct 4 12:28:01...
The current implementation of `Workspace.rename_file_group` is smart by going after the affected image file references within PAGE files as well: https://github.com/OCR-D/core/blob/71d295ac1fccbeb4164e230bd584e1920b9ab3c8/ocrd/ocrd/workspace.py#L324-L342 It would be even better if ALTO files (i.e....