Adrien Barbaresi comments

Results 412 comments of


                                            Adrien Barbaresi

Feat/better apporach to greedy lookups

I'll now close the PR for the sake of clarity and move on with the rest. We can come back to it later if you find a solution which does...

simplemma.lang_detector import no longer working

I guess we can use an alias an import it during init. This was a question we discussed with @juanjoDiaz but something must have got lost around the way.

simplemma.lang_detector import no longer working

This has already been mentioned in #64. - The `0.9.1` readme says: `from simplemma.langdetect import in_target_language, lang_detector` - The current readme says `from simplemma import in_target_language, lang_detector` So we could...

simplemma.lang_detector import no longer working

See also https://github.com/adbar/simplemma/commit/58b3ee7430568738f306a5386085eda6628c47d4: `from simplemma.langdetect` → `from simplemma.language_detector`

Words that match more than one lemma

Tough one, this is an absolute borderline case since multiple matches are usually not present in lists and they may be annotated differently. Concerning the "noun vs. verb" issue this...

Words that match more than one lemma

The approach you suggest would probably give better results but memory is already a concern for the available dictionaries. One way or the other there is always a tradeoff between...

Plans for simplemma 1.0 release?

I now seriously plan a next release before the summer, if everything goes well even earlier than that. The remaining issues will (hopefully) be addressed further along the way, from...

Investigate other data structures to store language data

Additional note: dictionaries can be more compact if keys and values are `bytes` instead of `str`. This would be a first step to decrease memory footprint.

Investigate other data structures to store language data

@Dunedan First I'd like to say I play 0 A.D. from time to time so it is really nice to see you find Simplemma useful in this context. Thanks for...

Investigate other data structures to store language data

Concerning the first point and before you draft a PR: how about using pure-Python tries? Maybe the slowdown is acceptable considering the portability of this solution? Breaking down language data...