Adrien Barbaresi

Results 415 comments of Adrien Barbaresi

It's unclear to what extent the UD corpus has been manually corrected now that I further look at the description. There could be mistakes there as well, so SIKOR is...

I get much more word pairs from all inflected forms in Kaikki than from UD (although the UD forms should be more frequent). I'll try to integrate the data soon.

It it now added (version `0.8.2`, language code `se`), I used the opportunity to add a few other languages as well :heavy_check_mark: The linguistic material I used to build the...

Hi @nikopartanen, thanks for the evaluation! My impression is that the lemmatizer mostly behaves as expected, it rarely introduces mistakes (i.e. wrong lemmata), nearly all errors are tokens which do...

Hi @nikopartanen & @osma, have you tried the chain described above and did it improve the results? Also: since support has been added, can I close this issue for now?

OK, thanks for your answer, indeed it isn't a huge concern!

Trafilatura adopts a generic approach, there are enough scraping libraries supporting such functions.

@Lucabenj anything to share?

Hi @DavidNemeskey, I believe I addressed both concerns, does that work for you?

@knit-bee thanks for the additional check, that is correct, I'm going to close the issue.