Clemens Neudecker

Results 56 comments of Clemens Neudecker

Added .md file for AT content https://github.com/EuropeanaNewspapers/ner-corpora/blob/master/enp_at.bio/enp_at.md

Apologies for the late reply! > Only the prefix I- is used to tag named-entities. Is this because, in fact, there is no need to use the B- prefix (when...

@CatarinaPC fyi, I just pushed the changes to convert ``enp_FR`` from IO to BIO scheme to the [v0.2 branch](https://github.com/EuropeanaNewspapers/ner-corpora/tree/0.2).

@CatarinaPC No worries, thank you for the questions. Ad 1) What I meant is that the French partner used IO because they deemed it sufficient, but in my understanding this...

> I've also found it difficult to find a proper reference for BIO encoding and encoding in general. I was looking at the website: https://donovanong.github.io/ner/tagging-scheme-for-ner.html . They mention a paper...

> That is why I asked you about French NER shared tasks [Train/dev](https://github.com/impresso/CLEF-HIPE-2020/tree/master/data) sets (including French) for the [CLEF-HIPE-2020](https://impresso.github.io/CLEF-HIPE-2020/) shared task are now being released. > What do you give...

Explanation of issues and workarounds https://github.com/EuropeanaNewspapers/ner-corpora/wiki/Corpus-cleanup

@jbarth-ubhd weird, I have not seen any segmentation results like this coming from the tool. Can you attach the PAGE-XML as well plz? (cc @vahidrezanezhad)

Thanks for providing the test data. I can also confirm this via Aletheia. The issue seems to be with the region segmentation - where there are regions detected (see e.g....

Dear @jbarth-ubhd, I found some time to investigate this further and with the current version of `sbb-textline-detector`, while I do get the same problem with the RGB image, when using...