kenlm issues

can not building lm.binary using unigram

3

I am trying to use a unigram arpa file to build a kenlm Model in the python wrapper. However, I receive the following error: ![error](https://s33.postimg.cc/re2wgmu5r/Screenshot_from_2018-09-08_19-49-34.png) is there any way to...

Nourahussein

Add support for order=1 to TrieModel

3

change it to support ngram-order=1 (tested for TrieModel) 1. don't throw when order=1 2. do some initialization only when order > 1 to avoid segfault 3. set default RecordReader::remains_ to...

changq

Remove Boost dependency

It would be great to not have a Boost dependency. A lot of the features in modern C++ make boost an unnecessary dependency as well. Certain deployment environments don't have...

pavelgrib

multi-thread to read text and count &&sort n-grams

1

I want to train huge data such as 10TB. As reading file in single-thread manner, it's slow. How can kenlm support reading file in multi-thread manner?

cmathx

Add prebuilt for windows

1

ArcticLampyrid

windows

training corpus stuck

11

I tried to train a langague model with a corpus but seems it stucks at the beginning. Couldn't investigate the cause. `bzcat clean_corpus.tar.bz2 | python process.py | kenlm/build/bin/lmplz -S 8G...

CuriousDeepLearner

ERROR: 1-gram discount out of range for adjusted count 2

2

Hi,I have a Chinese text,which have 6300440 lines,about 315MB it likes this: ![2018-07-13 10-33-21](https://user-images.githubusercontent.com/19542945/42669988-0546479e-868b-11e8-8584-11d4ea7b7e1f.png) I use the command lmplz -o 5 --prune 1 2 2 3 4 lm.arpa The error...

Sundy1219

how does full_scores calculate ?

How full_scores method determines different segments of a sentence to compute conditional probabilities ? Does it starts with lower to higher n-grams always (2 to n) in increasing order of...

smilenrhyme

Segmentation fault by estimating

2

I am trying to estimate language model on Raspbian. I got a segmentation fault when running `kenlm/build/bin/lmplz -o 4 --prune 0 1 2 3 --limit_vocab_file vocab.txt --interpolate_unigrams 0 lm.arpa`. Information...

gospodima

Error - Chain sizes -

1

i am getting this error: 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:24 2:0stream/chain.cc:41 in util::stream::Chain::Chain(const util::stream::ChainConfig&) threw ChainConfigException because `config.total_memory < config.entry_size * config.block_count'. Chain configured with...

btrzmntr

kenlm
kenlm copied to clipboard

Metadata

can not building lm.binary using unigram

Add support for order=1 to TrieModel

Remove Boost dependency

multi-thread to read text and count &&sort n-grams

Add prebuilt for windows

training corpus stuck

ERROR: 1-gram discount out of range for adjusted count 2

how does full_scores calculate ?

Segmentation fault by estimating

Error - Chain sizes -

← Metadata

Owner

Metadata

kenlm kenlm copied to clipboard

Metadata

← Metadata

Owner

Metadata

kenlm
kenlm copied to clipboard