pyspellchecker icon indicating copy to clipboard operation
pyspellchecker copied to clipboard

Higher Levenshtein Distance than 2

Open fwollatz opened this issue 5 years ago • 2 comments
trafficstars

It is not possible to correct Words, that have a higher Levenshtein-Distance than 2. (At least in German).

A parameter to change this would be much appreciated.

fwollatz avatar Aug 19 '20 12:08 fwollatz

Currently I limit the Levenshtein-Distance to 2 for all languages, including German. The reason is that for longer words (say 10 characters or longer) getting past 2 creates a very long list of possibles to check and the system slows down. It is feasible to allow larger Levenshtein-Distances and I would appreciate any help or pull requests that make that possible.

barrust avatar Aug 22 '20 12:08 barrust

I think it's possibly necessary to shift from creating all candidates to O(n^2) loop on the vocabulary with some efficient variant (Mbleven, for example) of edit distance. Then the complexity will be (~) a function that depends more on the vocabulary size rather than the edit distance.

Though, from experience - I have to say it does not scale well (but better than creating all candidates, at least for long words).

maayanorner avatar Aug 29 '20 14:08 maayanorner