spellcorrect icon indicating copy to clipboard operation
spellcorrect copied to clipboard

How to train my custom data?

Open GabrielLin opened this issue 7 years ago • 5 comments
trafficstars

Thanks.

GabrielLin avatar Mar 03 '18 10:03 GabrielLin

All the data is stored in .data files. You can modify them, update them or replace them with your own data processed, probably by a simple script to match the expected format. Unfortunately, I didn't spend much time then to define the models declaratively, but it should be easy to decompose visually. Once your data matches the expected format, the script should be able to train itself at startup.

jbhoosreddy avatar Mar 09 '18 03:03 jbhoosreddy

Hi @jbhoosreddy , thanks for the repo and the data. However, can you throw some light on how to create the confusion matrices (dictionary) if I have a list of unigrams (from Google 1T) with their frequencies?

acerock6 avatar Dec 05 '18 13:12 acerock6

Thanks for your solution. @jbhoosreddy . Sorry for the late reply.

GabrielLin avatar Apr 08 '19 10:04 GabrielLin

Hi @jbhoosreddy , thanks for the repo and the data. However, can you throw some light on how to create the confusion matrices (dictionary) if I have a list of unigrams (from Google 1T) with their frequencies?

The confusion matrix used in this program comes from the paper A Spelling Correction Program Based on a Noisy Channel Model.

yhshu avatar Jul 27 '19 12:07 yhshu

Hey @lzw429! Thanks for identifying where this data came from.

My earliest recollection is that I saw this data in a textbook and attempted to recreate the data and pseudocode to validate for myself that the spell correct approach works.

jbhoosreddy avatar May 19 '20 21:05 jbhoosreddy