Adrien Barbaresi
Adrien Barbaresi
This project started as a simple experiment and I didn't implement version control on the data so what I did is not reproducible I'm afraid. The lists mentioned in the...
Hi @joprice I was working on something else but I take good note of your example. By design the simplemma preserves the word forms rather than overgenerating lemmata. As it...
See also [ruff-pre-commit](https://github.com/astral-sh/ruff-pre-commit).
Yes, please go ahead!
Concerning the documentation you need to add a line in the [CONTRIBUTING.md file](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md) explaining how to install and run pre-commit.
Hi @juanjoDiaz this change can have far-reaching consequences for the lemmatization process so I'm going to test it for all languages for which I have reference data to prevent degraded...
Not sure about the conflicts mentioned by Github, you'd better look into that. I'll run the tests once the `IndexError` is fixed.
@juanjoDiaz I cannot run the evaluation because this branch is out of sync, could you please have a look at it? In the meantime I'll use the legacy functions which...
The PR slightly decreases the performance of non-greedy lemmatization for many languages. The part you removed appears to be a trick to improve the results a bit. I would be...
It generally decreases by -0.1 to -0.2, not that much however I'd prefer to keep the functionality as it is. The "greedy" option is actually more like "greedier", the lemmatizer...