spellcorrect
spellcorrect copied to clipboard
How to train my custom data?
Thanks.
All the data is stored in .data files. You can modify them, update them or replace them with your own data processed, probably by a simple script to match the expected format. Unfortunately, I didn't spend much time then to define the models declaratively, but it should be easy to decompose visually. Once your data matches the expected format, the script should be able to train itself at startup.
Hi @jbhoosreddy , thanks for the repo and the data. However, can you throw some light on how to create the confusion matrices (dictionary) if I have a list of unigrams (from Google 1T) with their frequencies?
Thanks for your solution. @jbhoosreddy . Sorry for the late reply.
Hi @jbhoosreddy , thanks for the repo and the data. However, can you throw some light on how to create the confusion matrices (dictionary) if I have a list of unigrams (from Google 1T) with their frequencies?
The confusion matrix used in this program comes from the paper A Spelling Correction Program Based on a Noisy Channel Model.
Hey @lzw429! Thanks for identifying where this data came from.
My earliest recollection is that I saw this data in a textbook and attempted to recreate the data and pseudocode to validate for myself that the spell correct approach works.