Wolf Garbe comments

Results 61 comments of


                                            Wolf Garbe

Common contractions are missing from frequency_dictionary_en_82_765.txt

The frequency_dictionary_en_82_765.txt was created by intersecting the two lists mentioned below. By reciprocally filtering only those words which appear in both lists are used. Additional filters were applied and the...

Common contractions are missing from frequency_dictionary_en_82_765.txt

Google: "How does the Ngram Viewer handle punctuation? We apply a set of tokenization rules specific to the particular language. In English, contractions become two words (they're becomes the bigram...

Common contractions are missing from frequency_dictionary_en_82_765.txt

Thank you. I'm sorry for the delay, its still on my to-do list ...

is there any flutter / dart port?

I'm not aware of any SymSpell Dart port. Perhaps one could use the [dart:js library](https://api.dart.dev/stable/2.15.1/dart-js/dart-js-library.html) to access a SymSpell Javascript port.

Support for weighted edit distance

1. There is a third-party SymSpell implementation with weighted Damerau-Levenshtein edit distance / keyboard-distance: https://github.com/searchhub/preDict 2. Weighted edit distance can also be added as a post-processing step. The preliminary SymSpell...

Peformance while adding dictionary

> I thought about adding words in smaller chunks via multiple for loops using CreateDictionaryEntry() function on arrays. > To make this not stall the application, I could run those...

[Question] About SymSpell model and probabilistic models (Norvig, etc.)

Are you referring to the [ITRANS scheme of Devanagari transliteration](https://en.wikipedia.org/wiki/Devanagari_transliteration#ITRANS_scheme)? **Character-based transliteration:** There seem to exist some straight forward solutions to solve the ambiguity of the 1 to N translation...

[Question] About SymSpell model and probabilistic models (Norvig, etc.)

To utilize a sentence-wide context to solve ambiguity you need n-gram probabilities (co-occurrence probabilities between multiple terms), not the single word probabilities (word frequencies) used in SymSpell/Norvig. See also [Using...

[Question] About SymSpell model and probabilistic models (Norvig, etc.)

Let me know if you find something interesting. Thanks.

any comparation with lucene's Levenshtein Automaton ?

Something like http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata or https://issues.apache.org/jira/browse/LUCENE-2507 ? From what I understand from Michael McCandless post: Prior to 4.0, FuzzyQuery took a brute force approach: it visits every single unique term in...