simplemma Add README section on advanced usage via classes

As discussed in https://github.com/adbar/simplemma/issues/110#issuecomment-1673306133, this PR adds a section to the top level README with examples of advanced usage via the Simplemma classes. I used the cache limiting use case I had in mind for the example, but I tried to explains it as a pattern that can be applied also for other customization requirements. Any comments are welcome, I'm happy to adjust as necessary.

While working on this, I discovered some problems that I reported as separate issues #111 and #112.

Aug 11 '23 07:08 osma

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.12%. Comparing base (fa1d964) to head (5c5a34c). Report is 5 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #113       +/-   ##
===========================================
- Coverage   96.62%   81.12%   -15.50%     
===========================================
  Files          33       35        +2     
  Lines         651      779      +128     
===========================================
+ Hits          629      632        +3     
- Misses         22      147      +125

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Aug 11 '23 07:08 codecov[bot]

One aspect of the class API I'm wondering about is the difference between Lemmatizer and LanguageDetector in that Lemmatizer is language-agnostic (the same instance can be used with many languages/combinations, just passing a different lang argument to the lemmatize() method) while LanguageDetector is given a language (or tuple of languages) when it's constructed, so the same instance cannot be reused if you happen to need a different set of languages.

Neither way is wrong, but it seems like these could perhaps be harmonized - either by making Lemmatizer language-specific, or by making LanguageDetector language-agnostic. Maybe @juanjoDiaz has an explanation for the current situation and whether it makes sense to keep it as it is or to try to unify these.

Aug 11 '23 08:08 osma

Thanks for the added docs and good point above, you could actually open an issue regarding the harmonization of Lemmatizer and LanguageDetector. It's not the priority now though, so we can add a corresponding sentence in the docs if you feel users might fail to understand the current difference.

Aug 11 '23 11:08 adbar