Results 82 comments of Matyáš Kopp

### U+0096 (SPA) Unicode Character - [ ] remove character This character is allowed in ParlaMint, but it causes problems in linguistic annotations, I suggest removing it from the text:...

### Named entities - [ ] named entities contains non-proper names I guess you are using a model that labels not only named entities from PER/LOC/ORG/MISC set but also DATE...

### shifted NEs ? - [ ] shifted NEs In this paragraph (ParlaMint-RO_2000-10-24-id4980.u2.seg8.2), NEs seem to be shifted. https://raw.githubusercontent.com/clarin-eric/ParlaMint/3f2d0a820d31aa7e55b72156089a3450b303e3bc/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.ana.xml reformated and remove token elements (`w` and `pc`) ```XML atitudinea autorităţilor...

### Voci din sală: in utterance - [ ] voice from the hall https://github.com/romanian-parlamint/ParlaMint/blob/a510c149ba04407fe6df77414b3a2aaec6f47022/Data/ParlaMint-RO/ParlaMint-RO_2000-10-24-id4980.xml#L408-L414 ```XML Domnul Vasile Lupu: Să vedem cine îl face. (Rumoare în partea stângă a sălii) Dar,...

### person - affiliation - organization - [ ] parliamentary groups - [ ] only one virtual parliamentary group `Placeholder parliamentary group` - [ ] government I guess you are...

### strange UPosTag `_` when `Mc-s-d` - [ ] UPosTag of digit tokens `Mc-s-d` Every token with `pos="Mc-s-d"` has wrong `msd="UPosTag=_"`. sample: ```XML 1990 ``` You can fix this with...

I probably don't understand the point of this issue... PL corpus is correctly encoded: https://github.com/clarin-eric/ParlaMint/blob/3c8ad8aeab6d854cdd5e9113115b944e37d7e6d9/ParlaMint-PL/ParlaMint-PL_2018-09-27-senat-65-2.ana.xml#L396-L403 In this case, `kinesic` is within the `u` (speech element) because it happens during the...

Ok, so it is not a problem with data. But It is problem with representation in NoSketch. I believe that it will be solved with #83, @TomazErjavec, Am I right?

Oh, I have not studied the screenshot carefully - It is really weird - it should not happen! Every named entity should contain at least one token.

I assumed that a "Dzwonek" is part of the speech (not just a note), but if it should be encoded as an incident, then it is a bug (placing the...