simplemma icon indicating copy to clipboard operation
simplemma copied to clipboard

Use custom dictionaries

Open 1over137 opened this issue 1 year ago • 1 comments

It would be nice if the API provided a way of loading a custom dictionary without resorting to patching the data in the module. In some languages, the lemmatizer coverage can be rather poor, and other languages are not supported at all. If this is welcome and we can agree on what the API should look like, I can implement this and make a PR. My idea would be passing a dict argument to the simplemma.lemmatize, or a global state that stores which extra dicts to use in each language and a few functions to manipulate it.

1over137 avatar Mar 27 '24 18:03 1over137

I prefer working towards releasing a version 1 and see from there, that includes documentating how the sources are compiled, I'm working on it.

The API is not completely stable right now as a few things are still broken after an intensive refactoring. I'd suggest you wait with your PR until things have stabilized a bit. Using the new classes to load external dictionaries seems like a good approach.

adbar avatar Apr 02 '24 12:04 adbar

@1over137 You can start working on a PR if you want, the API for dictionary lookup strategy is stable. I also added info in the training readme on additional dictionaries.

adbar avatar May 22 '24 14:05 adbar

Hi guys,

Such API is already there. You just need to implemente the DictionaryFactory protocol and use it to load your custom dictionaries.

juanjoDiaz avatar May 24 '24 22:05 juanjoDiaz

@1over137 Did that solve your problem or do we need to work on the documentation?

adbar avatar Jun 26 '24 09:06 adbar

Closing as this was answered. Feel free to reopen if there are more questions.

juanjoDiaz avatar Aug 08 '24 13:08 juanjoDiaz