SymSpell icon indicating copy to clipboard operation
SymSpell copied to clipboard

How to handle completely wrong sentence word?

Open AnandDev8 opened this issue 6 years ago • 1 comments

Hi, First of all SymSpell is damn fast and kind of does my job for spell correction but the issue I am facing is when my application user intentionally type any completely wrong word or sentence Symspell would would come up with a right word for it which can be avoided Example User types: avedoamlkejuike... Syspell: a video am like juice keen... Something like this which is totally irrelvant for my usecase So how can I solve this just by using Symspell?? Thanks in advance

AnandDev8 avatar May 03 '19 08:05 AnandDev8

This is a common problem. It is not only that the user intentionally types something wrong, but there are always unknown words in the input text that are not in the dictionary.

So how we can distinguish a non-existing or unknown word from a misspelled word?

Solution1: restrict the maximum edit distance (the number of splits + the numer of spelling corrections) within a sliding text window. If a string needs to be split into too many small words, and almost all of the sub-words need an additional spelling correction we can assume that this is an unknown/non-existing word.

Solution2: Use n-gram probabilities or Markov-chains. The co-occurrence of words is not random. Some words are more likely to occur together in a sentence than others, some words are frequently follow each others, others never. if the n-gram probabilities of the split and corrected words are below a certain threshold, we can assume that this is not a genuine correction, but an unknown/non-existing word.

Both solutions are currently not part of SymSpell and need to be implemented as an extension or by modifying the SymSpell code.

wolfgarbe avatar May 03 '19 09:05 wolfgarbe