JamSpell icon indicating copy to clipboard operation
JamSpell copied to clipboard

Build the language model based on n-grams frequencies

Open iosadchiy opened this issue 7 years ago • 3 comments

This is not intended to be merged but to discuss if this feature can be useful.

I was experimenting with n-grams frequencies from Ruscorpora. The idea was to load the frequency files directly into the model:

jamspell load_ngrams alphabet_ru.txt 1grams.csv 2grams.csv 3grams.csv ru.bin

You can see some short samples of the .csv files here

Let me know if this feature can be useful.

iosadchiy avatar May 27 '18 09:05 iosadchiy

Thanks for PR, good feature! Let me know when you finish - I'll be glad to merge it. BTW, could you please upload somewhere your model? I'd like to compare it to a model trained on wikipedia+news.

bakwc avatar May 27 '18 09:05 bakwc

Yep, sure, the model is here

iosadchiy avatar May 27 '18 10:05 iosadchiy

@iosadchiy, did you have time to fix the PR checks error?

rprilepskiy avatar Jan 27 '20 13:01 rprilepskiy