text2vec
text2vec copied to clipboard
Norvig spell corrector
As a side note / hint to spell checking: just stumbled over the ropensci/hunspell package. Have not digged into the details of the implementation, but the basic idea is that it checks which affixes and word stems are allowed in a certain language and checks a text against the entries in a dictionary (which can be taken, e.g., from LibreOffice) - more details in package doc, e.g., in hunspell.R. Hence, if my understanding is correct, the hunspell approach is less probabilistic than the one of Norvig, which allows to easily use own training data, but might still be useful depending on the task to be solved since existing dictionaries can directly be used. Might be worth comparing the quality of results between the both approaches (if anyone finds the time...).