Wolf Garbe
Wolf Garbe
> 1. process the frequency dictionary once and save it for re-use You have to serialize/deserialize the delete data structure of symspell and save it to a file/load it from...
As far as I understand [Node.js](https://en.wikipedia.org/wiki/Node.js) is an JavaScript run-time environment that executes JavaScript code outside the browser. So it could be used to run one of the [SymSpell JavaScript...
Thanks, I will look into this issue.
> In the following upcoming change. Can you elaborate on how the pigeon hole principle is used, what algorithm is used for it? **Current approach: Prefix** Currently, we take a...
No update on the pigeonhole partitioning implementation yet. SymSpell uses the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) and therefore supports also the transposition of two adjacent characters.
> The actual implementation gets a bit more complicated 1. When partitioning you have to make sure that the resulting parts are still sufficiently long so that they have a...
Symspell takes two factors into account when ranking correction candidates: 1. the Damerau-Levenshtein edit distance between the misspelled term and the correction candidates (smaller edit distance wins) 2. the term...
Yes, but I'm not sure when I will find time.
Great! For the utf8 string normalization in c++ you could have a look at utf8proc_normalize_utf32 in https://github.com/JuliaStrings/utf8proc
More-like-this query (MLT) / similarity search is interesting. Something like this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html **Parameter:** - list of documents or list of document_id - field_filter: from which fields we extract relevant terms,...