Matyáš Kopp comments

Results 82 comments of


                                            Matyáš Kopp

Sample for ParlaMint-RO

@RePierre your sample is too large with annotated files. Can you please remove some pairs of TEI and TEI.ana files (and also `

Unclosed quotation marks in tab-separated text files lead to line merging when parsed with pandas

Maybe you use an additional setting, something like (I haven't tested it): ```python import pandas as pd import csv current_df = pd.read_csv(file, sep="\t", index_col=False, quoting=csv.QUOTE_NONE, escapechar=None) ```

ParlaMint 4.1 BE: Unexpected processing warning line at start of ParlaMint-BE.txt file

This looks like a Java issue: https://stackoverflow.com/questions/76327/how-can-i-prevent-java-from-creating-hsperfdata-files Changing java setting and/or additional validation needs to be done However, I have no idea how this error ends in the file, and...

ParlaMint 4.1 BE: Unexpected processing warning line at start of ParlaMint-BE.txt file

Sorry, I missed this. I will implement this in future, and together with this, a kind of derived format validation can be implemented. - eg TSV is valid

Translate taxonomies to all languages

@TomazErjavec I have inserted new taxonomies and reinserted taxonomies with missing translations (the checklist is up to date)

ParlaMint-SI: additional metadata files for sentiment?

> This could be included in our metadata files (*-meta.tsv). However, since SI will be the only corpus containing this additional information, the other corpora would be missing this information...

speedup: customize a number od jobs

I did almost similar things in a separate branch; I am now testing it before merging it to develop... https://github.com/clarin-eric/ParlaMint/pull/894

Data

@TomazErjavec I had to trigger the action again with empty commit, now it seems to work

Data

> > @TomazErjavec I had to trigger the action again with empty commit, now it seems to work > > @matyaskopp, indeed, it did finish now, now checks are failing...

Per-country translations in taxonomies

I checked the language tag documentation: https://www.rfc-editor.org/rfc/rfc5646.html#page-5, and it contains a more detailed structure than I expected. I can see a problem with using the _-region_ part of the language...