ner-corpora icon indicating copy to clipboard operation
ner-corpora copied to clipboard

Quality

Open cneud opened this issue 8 years ago • 2 comments

In some cases, not all named entities in the text have been annotated. Another proof-run should be made to mitigate the effect of this on the application of the data for training, evaluation, asf. This can include clean up of various other issues related to data quality (OCR errors, etc).

cneud avatar Mar 09 '16 14:03 cneud

Explanation of issues and workarounds https://github.com/EuropeanaNewspapers/ner-corpora/wiki/Corpus-cleanup

cneud avatar Mar 18 '16 14:03 cneud

Since 'http://www.theeuropeanlibrary.org/tel4/newspapers/' was taken down in 2016, the clean-up process won't work anymore.

cneud avatar Apr 05 '23 20:04 cneud