Matyáš Kopp
Matyáš Kopp
BG corpus has a suspiciously small number of ministers. Corpus period: https://github.com/clarin-eric/ParlaMint/blob/cb93f7eb5002b6bd608600a6c800accfdce9c72b/Samples/ParlaMint-BG/ParlaMint-BG.xml#L125 according to wiki, there are 7 governments (n93 - n99) that should be covered in corpus: https://bg.wikipedia.org/wiki/%D0%9F%D1%80%D0%B0%D0%B2%D0%B8%D1%82%D0%B5%D0%BB%D1%81%D1%82%D0%B2%D0%B0_%D0%BD%D0%B0_%D0%91%D1%8A%D0%BB%D0%B3%D0%B0%D1%80%D0%B8%D1%8F But...
BE feedback
I have just a few observations: ## Responsibility for lingv. annotations in TEI version - [x] remove linguistic annotation responsibility from TEI version https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE.xml#L17-L21 ```XML Jesse de Does Taalkundige verrijking...
We have decided to use parliamentary groups instead of political parties. PL data already has `parliamentaryGroup` but uses `politicalParty` (assuming from words _Klub_ and _Koło_). We have one set of...
I have come across strange persons in PL data that seem not to be a person or have wrong name: https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L3150-L3159 https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L6616-L6623 https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L10530-L10545 Is it possible to fix it?
I have been exploring why the validation is so slow. ### jing jing allows to validation of multiple files with the same schema in parallel. These are the speeds for...
- [ ] in annotated version is missing the very first note: https://github.com/clarin-eric/ParlaMint/blob/cb93f7eb5002b6bd608600a6c800accfdce9c72b/Samples/ParlaMint-LV/ParlaMint-LV_2019-01-31-PT13-516.ana.xml#L76-L78 vs TEI https://github.com/clarin-eric/ParlaMint/blob/cb93f7eb5002b6bd608600a6c800accfdce9c72b/Samples/ParlaMint-LV/ParlaMint-LV_2019-01-31-PT13-516.xml#L70-L73 --- - [ ] BTW, the first note should be the type of `narrative`,...
The annotated version contains only one `` in a single ``. It seems that all content is there, but everything has been merged into the first `` in utterance See...
eg in the document `2018/ParlaMint-GB_2018-12-04-commons.xml` result in TEITOK on this page: https://lindat.mff.cuni.cz/services/teitok/parlamint-40/index.php?action=ner&cid=ParlaMint-GB/2018/ParlaMint-GB_2018-12-04-commons.xml&pageid=pb-15&pbtype= source: https://hansard.parliament.uk/Commons/2018-12-04/debates/B2784071-CD44-46F6-8246-9656F9138780/Planning(Appeals)#contribution-8DB86CA4-DEEF-4D4B-86D8-C077567F4452  result:  or source: https://hansard.parliament.uk/Commons/2018-12-04/debates/B2784071-CD44-46F6-8246-9656F9138780/Planning(Appeals)#contribution-4DF94B1B-89C5-404E-8739-9C26C23D7689  result: 
```XML A Magyar Országgyűlés Korpusza ParlaMint-HU-en [ParlaMint-en.ana] Hungarian parliamentary corpus ParlaMint-HU-en [ParlaMint-en.ana] A Magyar Országgyűlés ülései, 7., 8. és 9. ciklus (2014 - 2023) Minutes of the National Assembly of...
The source of transcriptions of PT debates does not seem to contain paragraphs, but in the corpus, it is somehow segmented into paragraphs (my guess is if the punctuation `.`/`?`/...