core
core copied to clipboard
Collection of OCR-related python tools and wrappers from @OCR-D
The most recent generateDS PAGE-XML model now contains validation of type restrictions, which is laudable. But these messages are aggregated in a way which makes diagnosing just where the error...
We often have lots of useful documentation for processors which the ocrd-tool.json does not and cannot cover: - README file - [DITA](https://ocr-d.de/en/dita.html) files - other documentation (publications, research notes, markdown...
Another idea that came up in https://github.com/OCR-D/ocrd_olena/issues/60: I routinely run validation after running each processor to catch problems early. If there was a standard option `--validate` in core (supplemented by...
Currently, any information on image resolution provided in the original image (and made available via `OcrdExif` in `Workspace.image_from_page`) is ignored when saving derived images in the workspace (via `Workspace.save_image_file`). Due...
Debugging the failure to build ocrd_fileformat, the problem was missing `ssh` and `git`. I think the additional space requirement is worth the out-of-box usability of the ocrd/core and derived images.
When calling `ocrd process` with `-l DEBUG` as parameter to a single process, this gives e.g. ``` /usr/bin/time -o /home/ocrd/workspace/gue-11660-24-e-1/time docker run --rm -u 1010 -e TESSDATA_PREFIX=/models -v /home/ocrd/workspace/gue-11660-24-e-1:/data -v...
When we began OCR-D/core, we targeted only `unittests` with some enhancements over time in `tests/base.py`. To run the tests, we're actually using pytest, though. We currently have ~300 tests which...
Related to #580 : OCR-D/core has a nifty `tests/base.py` test helper library with useful functionality beyond core. I sometimes copy that file to other projects which is redundant and leads...