ner-corpora
ner-corpora copied to clipboard
Quality
In some cases, not all named entities in the text have been annotated. Another proof-run should be made to mitigate the effect of this on the application of the data for training, evaluation, asf. This can include clean up of various other issues related to data quality (OCR errors, etc).
Explanation of issues and workarounds https://github.com/EuropeanaNewspapers/ner-corpora/wiki/Corpus-cleanup
Since 'http://www.theeuropeanlibrary.org/tel4/newspapers/' was taken down in 2016, the clean-up process won't work anymore.