Kenneth Heafield comments

Results 290 comments of


                                            Kenneth Heafield

Add access to vocabulary in python bindings

The C++ side doesn't even remember the vocabulary strings by default because users either don't need it or have their own data structure populated by the EnumerateVocab callback API.

How to incorporate unknowns in Lm Build process?

The filter is designed to use less memory by removing words that won't be queried by a given system. It's not what you're looking for. Replacing low-frequency words with <unk>...

Construction of LM model on the fly.

Currently language models can only really be built through through `lmplz` or through the C++ API that program calls internally. I don't think you want a Kneser-Ney language model here....

Construction of LM model on the fly.

@JRMeyer Is this increasing the probability of a unigram? You want a wrapper around the LM that adds a constant for elements of a set?

Construction of LM model on the fly.

As an MT person if it wasn't OOV, I'd call this run-time domain adaptation whereby you retrieve relevant sentences then upweight them in training

Construction of LM model on the fly.

You can certainly run two language models as features. Or we can make a wrapper that looks like kenlm but does some runtime changes to the probabilities. Let's figure out...

boost/system/system_error.hpp: No such file or directory

That looks like a pretty broken boost to me. A boost-internal header `boost/thread/exceptions.hpp` is including another file `boost/system/system_error.hpp` and not finding it. Are you sure you have only one version...

Generating arpa file taking too long

Disk bound or CPU bound? I had a custom branch with trie generation directly from building but it was too hacky to release.

Add support for order=1 to TrieModel

Not the most efficient way (the unigrams will have useless trie pointers and backoff). Should really be its own implementation but Holger Schwenk wants something too. . . so sure.

Translate arbitrary text to and from supported languages

Regarding clipboard, https://translatelocally.com/web/ and https://translatelocally.com/ though I recognize that we need to do a better job of cross-selling and integrating these offerings. Regarding selecting the language, try https://github.com/jelmervdl/firefox-translations/releases/download/v0.6.1/bergamot_translations.xpi and let...