Karl Bartel
Karl Bartel
See https://gitlab.com/gilles.serasset/dbnary/-/issues/80
Interesting. I wonder if I should like to some alternatives with a few words about the main differences, so that everyone can pick the best smu-like for project and personal...
What would be the rule? I don't think this should apply for all top level tags. E.g. ``` Welcome! Be nice to each other! Thanks. ``` should add paragraph tags...
That's interesting! I haven't noticed Wiktextract yet. I wonder what the Wiktextract and DBnary guys think of each other's work, since it overlaps at lot. WikDict does have inflection data,...
So far I only got the "Possible typo: you repeated a whitespace" warning and am quite happy with the languagetool integration otherwise. Maybe we can ignore that rule for markdown...
Or use https://pypi.org/project/iso-639/ (names only in English) or https://pypi.org/project/pycountry/ (names available in all languages via gettext localization).
I could write a custom tokenizer using https://github.com/hideaki-t/sqlite-fts-python/. Maybe removing the diacritics with one of the approaches from https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string.
The same problem exists for Swedish, where https://www.wikdict.com/de-sv/passa%20p%C3%A5 works but https://www.wikdict.com/de-sv/passa%20pa doesn't.
If I ever want to move off of sqlite, https://duckdb.org/ seems to have a better choice of tokenizers while keeping many of sqlite's benefits.
Using stemmers from https://github.com/abiliojr/fts5-snowball should also solve the problem. I'm not sure how much stemming should be done on a dictionary, though.