Kenneth Heafield
Kenneth Heafield
The C++ side doesn't even remember the vocabulary strings by default because users either don't need it or have their own data structure populated by the EnumerateVocab callback API.
The filter is designed to use less memory by removing words that won't be queried by a given system. It's not what you're looking for. Replacing low-frequency words with <unk>...
Currently language models can only really be built through through `lmplz` or through the C++ API that program calls internally. I don't think you want a Kneser-Ney language model here....
@JRMeyer Is this increasing the probability of a unigram? You want a wrapper around the LM that adds a constant for elements of a set?
As an MT person if it wasn't OOV, I'd call this run-time domain adaptation whereby you retrieve relevant sentences then upweight them in training
You can certainly run two language models as features. Or we can make a wrapper that looks like kenlm but does some runtime changes to the probabilities. Let's figure out...
That looks like a pretty broken boost to me. A boost-internal header `boost/thread/exceptions.hpp` is including another file `boost/system/system_error.hpp` and not finding it. Are you sure you have only one version...
Disk bound or CPU bound? I had a custom branch with trie generation directly from building but it was too hacky to release.
Not the most efficient way (the unigrams will have useless trie pointers and backoff). Should really be its own implementation but Holger Schwenk wants something too. . . so sure.
Regarding clipboard, https://translatelocally.com/web/ and https://translatelocally.com/ though I recognize that we need to do a better job of cross-selling and integrating these offerings. Regarding selecting the language, try https://github.com/jelmervdl/firefox-translations/releases/download/v0.6.1/bergamot_translations.xpi and let...