Clemens Neudecker issues

Results 17 issues of


Clemens Neudecker

Workshop : Recent Development in OCR for Digital Libraries

[LITIS/IMPACT Workshop : Recent Development in OCR for Digital Libraries](https://vimeo.com/channels/ocrworkshop)

Metadata

Currently, there is no metadata providing information about the newspaper titles, date of publication, OCR quality etc. that the data is derived from.

enhancement

Quality

In some cases, not all named entities in the text have been annotated. Another proof-run should be made to mitigate the effect of this on the application of the data...

enhancement

usage of ocrd-cis-postcorrect in ocrd_all

Running [ocrd_cis/ocrd-cis-postcorrect](https://github.com/cisocrgroup/ocrd_cis#ocrd-cis-postcorrect) requires additional components that afaict are currently not installed with `ocrd_all`. See https://github.com/cisocrgroup/ocrd_cis/issues/51#issuecomment-667015061 > In order to run our post correction, both our profiler and an according language...

enhancement

question

Eynollah light integration

We should start integration of `eynollah_light` with the `main` branch - this helps track what conflicts etc. need to be resolved.

setup guide changes or simplifications

Various small changes or simplifications to make the setup guide (hopefully) shorter and more easily understandable(?)

Support Python 3.12

With aot `distutils` being removed from Python 3.12 (see [PEP 632](https://peps.python.org/pep-0632/)), most dependencies require updates. * `ocrd`: not supporting Python 3.12 yet * `numpy`: the last version supporting Python 3.8...

enhancement

POPP datasets (HTR)

https://github.com/Shulk97/POPP-datasets/

Feature request: list with error frequencies in report

A useful feature for further analysis of errors or for post-correction provided in other OCR evaluation tools are added statistics such as e.g. lists with the frequency of character/word errors...

enhancement

Support optional stopword list

A common use case for OCR evaluation (e.g. for search engine indexing, text- and data mining, asf.) is to omit stopwords from the word evaluation to get an understanding of...

enhancement