core
core copied to clipboard
Collection of OCR-related python tools and wrappers from @OCR-D
Beyond actual (syntactic) schema violations ("validity") and conventional (semantic) problems ("inconsistency"), we might want to check for and repair additional issues: - if `/PcGts/Page/ReadingOrder` or any of its children is...
Currently, the workspace validator will complain about any kind of file (including derived images) that is not contained in the structMap as physical page. IMO this is an error on...
When using the explicit workspace backup facilities, there's a glitch sometimes that makes usage difficult (requiring attention) and makes writing scripts harder: ``` $ ocrd workspace backup restore 3ab9097 Exception:...
When running processors with `ocrd process`, we already measure the wall and CPU time of a processor run. This adds basic tracking of memory usage with https://github.com/pythonprofilers/memory_profiler. Currently, this just...
As long as we only have command-line interfaces, I suppose security is a matter of system administration (user priviledges for FS access and ulimit restriction of runtime resources). It's difficult...
From https://github.com/qurator-spk/eynollah/pull/33#discussion_r618217772 >> Lest I forget: It would be nice to get the image along with the exif information: >> >> ```python >> exif, img_pil = self.workspace.resolve_image_exif(page.imageFilename) >> ``` >>...
> https://gitter.im/OCR-D/Lobby?at=603289f74c79215749fed1bb > @SB2020-eye 17:27 > Using Prima Page Viewer to view an xml file, is there a way to save what I am looking at as an image file?...
In addition to #627, we need to convert input images of the following color spaces to plain RGB at the beginning: - CMYK - HSV - YCbCr - LAB I'm...
METS/PAGE/ALTO provided by digitization workflow software or repositories will not always adhere [to the conventions we have in OCR-D](https://ocr-d.de/en/spec). OTOH the workspaces that are the result of OCR-D workflows contains...