Matyáš Kopp
Matyáš Kopp
https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L526-L533 https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L7893-L7900 https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L8521-L8528 https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L4372-L4378
> * missing text: this has to do with text paragraphs which could not automatically be classified in the first step op the conversion from HTML to TEI. In the...
> I don't think so, if we are taliking about ParlaMint-taxonomy-parla.legislature(.xml) , that one contains much more the just plenary speech transcription classification. I would be in favour of adding...
> But I can map our current speaker types to parlamint, assuming that parliament members, ministers, prime ministers, secretaries of states are "regulars", and the rest are "guest" (incidental speakers)?...
## invalid url format - [x] fix urls https://github.com/JessedeDoes/ParlaMint/blob/32213d529bbbb2b28ced35d2a7bfb74c2ba9edd1/Data/ParlaMint-BE/ParlaMint-BE.xml#L70-L78 ```XML https://www.dekamer.be/kvvcr/showpage.cfm?section=/cricra & language=nl & cfm=dcricra.cfm?type=plen & cricra=cri & count=all ```
## speeches misclassification - [ ] speeches misclassification I still don't understand why there are a lot of speeches misclassification. From my point of view (without language knowledge) HTML classes,...
The list of errors is here, you should see it: https://github.com/clarin-eric/ParlaMint/actions/runs/3620171785/jobs/6102099118#step:4:29
Your corpus is not still valid:  https://github.com/clarin-eric/ParlaMint/actions/runs/3727204532/jobs/6321168778#step:4:43 So I am suggesting the following procedure: - [ ] @miruskieta will fix the corpus to pass automatic validation. **Just TEI version**...
@miruskieta Can you please reduce the sample size - it is too huge to be validated. - remove the pair of TEI and TEI.ana files (the largest one) - remove...
Released READMEs should be fixed in https://github.com/clarin-eric/ParlaMint/blob/b27cbba669df722340a25d00dc3991390b5d91d7/Scripts/parlamint2distro.pl#L444