ParlaMint icon indicating copy to clipboard operation
ParlaMint copied to clipboard

ParlaMint: Comparable Parliamentary Corpora

Results 86 ParlaMint issues
Sort by recently updated
recently updated
newest added

We have two styles of renaming: 1. `person` renaming: https://github.com/clarin-eric/ParlaMint/blob/5deaeed5ae792f3ba1726072298885b5b64a6d64/Data/ParlaMint-AT/ParlaMint-AT-listPerson.xml#L76-L77 2. `org` renaming https://github.com/clarin-eric/ParlaMint/blob/a3cb2ecee74cd8925b0aa37831812703638c7b4e/Data/ParlaMint-HU/ParlaMint-HU-listOrg.xml#L632-L635 To be honest, I like more the first solution because it probably reflects better reality. I...

enhancement
🕮 Documentation

@TomazErjavec @matyaskopp we have an issue with speakers that are not identified (mostly, in the old debates). Below is a list, but just a few translations: Counter of votes; Voice...

There are multiple places where can be the link to original source data placed (attribute `@source` or `//bibl/idno`): https://github.com/clarin-eric/ParlaMint/blob/61308e517e857bf65d806fe3aa091cd82c7f6302/ParlaMint-CZ/ParlaMint-CZ_2018-11-13-ps2017-020-09-000-000.xml#L50-L55 https://github.com/clarin-eric/ParlaMint/blob/61308e517e857bf65d806fe3aa091cd82c7f6302/ParlaMint-CZ/ParlaMint-CZ_2018-11-13-ps2017-020-09-000-000.xml#L116-L119 https://github.com/clarin-eric/ParlaMint/blob/61308e517e857bf65d806fe3aa091cd82c7f6302/ParlaMint-CZ/ParlaMint-CZ_2018-11-13-ps2017-020-09-000-000.xml#L124-L127 Having this for all corpora would be really nice...

enhancement

Current schema allows this situation: ``` Oklaski ``` https://github.com/clarin-eric/ParlaMint/blob/92ba447bf720cf48d038ec3044257534332f18a7/Schema/ParlaMint-TEI.ana.rng#L112-L121 The schema should be restricted in this way: - every named entity should contain oneOrMore named entities or words. - And...

enhancement

## meeting element - [x] extend meeting elements (`#parla.term`, `#parla.sitting`) I haven't found any information about terms or sitting in the meeting elements. This is how other corpora implement it:...

In the BE corpus there are 723 paragraphs (segments) that have `@xml:lang="en"`, even though - at least the ones I've checked - are not in fact in English - they...

bug

According to Wikipedia https://en.wikipedia.org/wiki/Riksdag and `ParlaMint-SE-listOrg.xml` there are elections every four years: https://github.com/clarin-eric/ParlaMint/blob/643f902481a47e942b713febe9613c9f5472ea82/Samples/ParlaMint-SE/ParlaMint-SE-listOrg.xml#L5-L20 but in component files different period size is used: https://github.com/clarin-eric/ParlaMint/blob/643f902481a47e942b713febe9613c9f5472ea82/Samples/ParlaMint-SE/ParlaMint-SE_2017-12-12-prot-201718--48.xml#L11 https://www.clarin.si/ske/#text-type-analysis?corpname=parlamint30_se&tab=basic&filter=containing&onecolumn=1&wlattr=speech.term&wlminfreq=1&include_nonwords=1&itemsPerPage=50&showresults=1&cols=%5B%22frq%22%5D&wlsort=frq ![image](https://github.com/clarin-eric/ParlaMint/assets/5867995/f5491d82-7e47-4cc3-a1b6-e17b5c81204c) I believe this should be...

bug

For the Polish data set, if looking at top keywords, one gets _dzwonek_ (bell) and _oklaski_ (applause). This should not be included in the top keywords, because these are audio...

bug

https://github.com/clarin-eric/ParlaMint/actions/runs/4027956603/jobs/6924304418#step:4:297 ``` INFO: Char validation for ParlaMint-SE_2016-11-16-prot-201617--29.xml ERROR: File ParlaMint-SE_2016-11-16-prot-201617--29.xml contains bad chars: U+F0B7 (3x) INFO: Char validation for ParlaMint-SE_2020-11-04-prot-202021--29.xml ERROR: File ParlaMint-SE_2020-11-04-prot-202021--29.xml contains bad chars: U+AD (4x) ``` and...

bug

I am sorry for not noticing this in my first inspection of your corpus. There is a considerable amount of unrecognized transcriber comments ([6.3. Transcriber comments](https://clarin-eric.github.io/ParlaMint/#sec-comments)) inside the text, just...

bug