JamSpell
JamSpell copied to clipboard
Build the language model based on n-grams frequencies
This is not intended to be merged but to discuss if this feature can be useful.
I was experimenting with n-grams frequencies from Ruscorpora. The idea was to load the frequency files directly into the model:
jamspell load_ngrams alphabet_ru.txt 1grams.csv 2grams.csv 3grams.csv ru.bin
You can see some short samples of the .csv files here
Let me know if this feature can be useful.
Thanks for PR, good feature! Let me know when you finish - I'll be glad to merge it. BTW, could you please upload somewhere your model? I'd like to compare it to a model trained on wikipedia+news.
Yep, sure, the model is here
@iosadchiy, did you have time to fix the PR checks error?