Kenneth Heafield
Kenneth Heafield
This is cutting it close especially if you have other things loaded; I would recommend converting the original ARPA to binary format again with better compression options https://neural.mt/code/kenlm/structures/. You might...
Run the `filter` program. It will print a help message with more command line documentation. `bin/filter vocab model:in.arpa out.arpa
Use only one of vocab: or model: (the other is on stdin). Also, it's not a csv, it's whitespace-delimited tokens.
Not sure why our cythons generate different output. Does 0760f4c4df76f3286656e7232dc3ad6495248bc2 work for you?
There is no fast path for scoring the entire vocabulary in a given context. A forward trie is more optimal for that. KenLM implements a reverse trie to optimize individual...
Did you install the dependencies documented on https://kheafield.com/code/kenlm/dependencies/ ? I smell a missing `libboost-all-dev`.
@cheahheng Definitely the problem is you don't have the full repo just the inference stuff. Get it from this repo.
It looks like you're trying to compile `dump_trie_main.cc` on its own (the command line was cut off from the screenshot). I'd recommend using bjam for this (since it's the old...
I smell compilation with a different version of Boost than is installed as a shared library on the system.
Would a callback from LoadVirtual be sufficient?