pyspellchecker icon indicating copy to clipboard operation
pyspellchecker copied to clipboard

load_words is not prioritized

Open ledikari opened this issue 2 years ago • 1 comments

Looks like the functionality load_words is not prioritized in the spellchecking.

from spellchecker import SpellChecker

known_words = ['covid', 'Covid19']

spell = SpellChecker(language='en')
spell.word_frequency.load_words(known_words)


word = 'coved'
misspelled = spell.unknown(word)
print(spell.correction(allwords))

the output of this is loved

ledikari avatar Apr 21 '22 05:04 ledikari

You are correct, they are "prioritized" based on the number of instances that are found as the more common words are more likely to be the correct word (hence why it is called a frequency). You can help boost the newer words by doing something like this:

from spellchecker import SpellChecker
known_words = ['covid', 'Covid19'] * 1000

spell = SpellChecker(language='en')
spell.word_frequency.load_words(known_words)

Or you could use a different method:

from spellchecker import SpellChecker
known_words = {'covid': 1000, 'Covid19': 10000} 
spell = SpellChecker(language='en')
spell.word_frequency.load_dictionary(known_words)

barrust avatar May 28 '22 17:05 barrust