Matyáš Kopp
Matyáš Kopp
I have taken a more detailed look into the content of `` element. ### Simple post - one post at a time. ```XML Pastor Julián, Ana María 19571111 Cubillos NA...
> Is there anything I have to do? Gathering minister information from Wikipedia can be done with a script (I hope). @charlicruz or @matyaskopp can do it. --- Another issue...
> I can modify the politicalParty role to parliamentaryGroup for all xml files. This is already done: - in sample: https://github.com/clarin-eric/ParlaMint/pull/692/commits/a10afc44515fc57d0d46196157c0d4f8d3939afb - script: https://github.com/matyaskopp/PARLAMINT-ES-MC/commit/e2d5356c6f11056aefb07a66ae066c0d434b572e - all data: https://github.com/matyaskopp/PARLAMINT-ES-MC/commit/6f55c30dd9e2eb769029a3d7a8d6290377b2fb1e > I...
@TomazErjavec I am close to finishing all necessary scripts for producing the ParlaMint-ES corpus. Can you please take a look at the sample https://github.com/clarin-eric/ParlaMint/pull/692? If there is nothing serious before...
> * for handles you could use http://hdl.handle.net/11356/1859 (TEI) and http://hdl.handle.net/11356/1860 (ana) (but finalize script inserts that anyway) I am aware of that. I will preserve by wrong handle `http://hdl.handle.net/11356/XXXX`,...
> We can speed up jing 5 times, but the order of output will be different - not file by file. @TomazErjavec do we insist on this order?
> I actually don't think jing is the bottleneck, rather, it is the XSLT validation that is slow. Also, validate-parlamint.pl takes file one by one, so it would be difficult...
Ok, I have staged my changes. Another space for speeding up is the link-checker: Transform `teiCorpus/teiHeader` to a smaller temporary XML file, which contains just a list of elements with...
Parallel currently reports this warning on 60 threads: ``` INFO: Char validation for ParlaMint-IL_2004-01-01-16ptv487015.ana.xml INFO: XML validation for ParlaMint-IL_2004-01-01-16ptv487015.ana.xml INFO: XML validation for ParlaMint-IL_2004-01-01-16ptv487015.ana.xml INFO: Content validaton for ParlaMint-IL_2004-01-01-16ptv487015.ana.xml INFO:...
https://github.com/clarin-eric/ParlaMint/blob/ac6977569f3da1cd604fe1db9dfe87ec40a345a5/Data/ParlaMint-PL/ParlaMint-PL.xml#L10546-L10553