Matyáš Kopp
Matyáš Kopp
### U+0096 (SPA) Unicode Character - [ ] remove character This character is allowed in ParlaMint, but it causes problems in linguistic annotations, I suggest removing it from the text:...
### Named entities - [ ] named entities contains non-proper names I guess you are using a model that labels not only named entities from PER/LOC/ORG/MISC set but also DATE...
### shifted NEs ? - [ ] shifted NEs In this paragraph (ParlaMint-RO_2000-10-24-id4980.u2.seg8.2), NEs seem to be shifted. https://raw.githubusercontent.com/clarin-eric/ParlaMint/3f2d0a820d31aa7e55b72156089a3450b303e3bc/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.ana.xml reformated and remove token elements (`w` and `pc`) ```XML atitudinea autorităţilor...
### Voci din sală: in utterance - [ ] voice from the hall https://github.com/romanian-parlamint/ParlaMint/blob/a510c149ba04407fe6df77414b3a2aaec6f47022/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.xml#L408-L414 ```XML Domnul Vasile Lupu: Să vedem cine îl face. (Rumoare în partea stângă a sălii) Dar,...
### person - affiliation - organization - [ ] parliamentary groups - [ ] only one virtual parliamentary group `Placeholder parliamentary group` - [ ] government I guess you are...
### strange UPosTag `_` when `Mc-s-d` - [ ] UPosTag of digit tokens `Mc-s-d` Every token with `pos="Mc-s-d"` has wrong `msd="UPosTag=_"`. sample: ```XML 1990 ``` You can fix this with...
I probably don't understand the point of this issue... PL corpus is correctly encoded: https://github.com/clarin-eric/ParlaMint/blob/3c8ad8aeab6d854cdd5e9113115b944e37d7e6d9/ParlaMint-PL/ParlaMint-PL_2018-09-27-senat-65-2.ana.xml#L396-L403 In this case, `kinesic` is within the `u` (speech element) because it happens during the...
Ok, so it is not a problem with data. But It is problem with representation in NoSketch. I believe that it will be solved with #83, @TomazErjavec, Am I right?
Oh, I have not studied the screenshot carefully - It is really weird - it should not happen! Every named entity should contain at least one token.
I assumed that a "Dzwonek" is part of the speech (not just a note), but if it should be encoded as an incident, then it is a bug (placing the...