kenlm
kenlm copied to clipboard
KenLM: Faster and Smaller Language Model Queries
I am trying to use a unigram arpa file to build a kenlm Model in the python wrapper. However, I receive the following error:  is there any way to...
change it to support ngram-order=1 (tested for TrieModel) 1. don't throw when order=1 2. do some initialization only when order > 1 to avoid segfault 3. set default RecordReader::remains_ to...
It would be great to not have a Boost dependency. A lot of the features in modern C++ make boost an unnecessary dependency as well. Certain deployment environments don't have...
I want to train huge data such as 10TB. As reading file in single-thread manner, it's slow. How can kenlm support reading file in multi-thread manner?
I tried to train a langague model with a corpus but seems it stucks at the beginning. Couldn't investigate the cause. `bzcat clean_corpus.tar.bz2 | python process.py | kenlm/build/bin/lmplz -S 8G...
Hi,I have a Chinese text,which have 6300440 lines,about 315MB it likes this:  I use the command lmplz -o 5 --prune 1 2 2 3 4 lm.arpa The error...
How full_scores method determines different segments of a sentence to compute conditional probabilities ? Does it starts with lower to higher n-grams always (2 to n) in increasing order of...
I am trying to estimate language model on Raspbian. I got a segmentation fault when running `kenlm/build/bin/lmplz -o 4 --prune 0 1 2 3 --limit_vocab_file vocab.txt --interpolate_unigrams 0 lm.arpa`. Information...
i am getting this error: 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:24 2:0stream/chain.cc:41 in util::stream::Chain::Chain(const util::stream::ChainConfig&) threw ChainConfigException because `config.total_memory < config.entry_size * config.block_count'. Chain configured with...