Kenneth Heafield

Results 290 comments of Kenneth Heafield

Boost wants you to compile with the same version of g++ as the libraries were compiled with. Which may mean a newer g++ as @ftyers suggests.

This smells like the `cython` is out of date. Can you try this? 1. git clone https://github.com/kpu/kenlm/ 2. cd kenlm/python 3. `cython -3 --cplus kenlm.pyx` 4. pip install from the...

`BaseScore` takes a void pointer to the state object if you want to call that. So add some `&` around state. Why did you have `const char [4]` when`vocab.Index` returns...

Sorry I don't speak Chinese. But I can use machine translation. Need more context. 对不起,我不会说中文。 但是我可以使用机器翻译。 需要更多上下文。

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

This is currently not supported though if you want to send a pull request...

It means you can convert an ARPA from another toolkit without these symbols to binary format. However, there is currently no support to train a model without sentence boundary symbols.

It smells like you asked for longer n-gram lengths than your training data has. I should make a better error message for that or allow a bypass, I guess.

That page appears to have a 5-gram Kneser-Ney model then encourage people to load it with a lower order (such as a bigram model). This is a bad idea: https://neural.mt/papers/edinburgh/rest_paper.pdf...