Clemens Neudecker

Results 17 issues of Clemens Neudecker

[LITIS/IMPACT Workshop : Recent Development in OCR for Digital Libraries](https://vimeo.com/channels/ocrworkshop)

Currently, there is no metadata providing information about the newspaper titles, date of publication, OCR quality etc. that the data is derived from.

enhancement

In some cases, not all named entities in the text have been annotated. Another proof-run should be made to mitigate the effect of this on the application of the data...

enhancement

Running [ocrd_cis/ocrd-cis-postcorrect](https://github.com/cisocrgroup/ocrd_cis#ocrd-cis-postcorrect) requires additional components that afaict are currently not installed with `ocrd_all`. See https://github.com/cisocrgroup/ocrd_cis/issues/51#issuecomment-667015061 > In order to run our post correction, both our profiler and an according language...

enhancement
question

We should start integration of `eynollah_light` with the `main` branch - this helps track what conflicts etc. need to be resolved.

Various small changes or simplifications to make the setup guide (hopefully) shorter and more easily understandable(?)

With aot `distutils` being removed from Python 3.12 (see [PEP 632](https://peps.python.org/pep-0632/)), most dependencies require updates. * `ocrd`: not supporting Python 3.12 yet * `numpy`: the last version supporting Python 3.8...

enhancement

https://github.com/Shulk97/POPP-datasets/

A useful feature for further analysis of errors or for post-correction provided in other OCR evaluation tools are added statistics such as e.g. lists with the frequency of character/word errors...

enhancement

A common use case for OCR evaluation (e.g. for search engine indexing, text- and data mining, asf.) is to omit stopwords from the word evaluation to get an understanding of...

enhancement