Adrien Barbaresi comments

Results 412 comments of


                                            Adrien Barbaresi

Document how dictionaries are created

This project started as a simple experiment and I didn't implement version control on the data so what I did is not reproducible I'm afraid. The lists mentioned in the...

greedy decomposition not working on some german verbs

Hi @joprice I was working on something else but I take good note of your example. By design the simplemma preserves the word forms rather than overgenerating lemmata. As it...

Configure pre-commit for this repository and update documentation

See also [ruff-pre-commit](https://github.com/astral-sh/ruff-pre-commit).

Configure pre-commit for this repository and update documentation

Yes, please go ahead!

Configure pre-commit for this repository and update documentation

Concerning the documentation you need to add a line in the [CONTRIBUTING.md file](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md) explaining how to install and run pre-commit.

Feat/better apporach to greedy lookups

Hi @juanjoDiaz this change can have far-reaching consequences for the lemmatization process so I'm going to test it for all languages for which I have reference data to prevent degraded...

Feat/better apporach to greedy lookups

Not sure about the conflicts mentioned by Github, you'd better look into that. I'll run the tests once the `IndexError` is fixed.

Feat/better apporach to greedy lookups

@juanjoDiaz I cannot run the evaluation because this branch is out of sync, could you please have a look at it? In the meantime I'll use the legacy functions which...

Feat/better apporach to greedy lookups

The PR slightly decreases the performance of non-greedy lemmatization for many languages. The part you removed appears to be a trick to improve the results a bit. I would be...

Feat/better apporach to greedy lookups

It generally decreases by -0.1 to -0.2, not that much however I'd prefer to keep the functionality as it is. The "greedy" option is actually more like "greedier", the lemmatizer...