Kenneth Heafield

Results 290 comments of Kenneth Heafield

Thank you for investigating! The de-en model does seem to generate `Ã3` garbage and probably needs to have its training data investigated. That said, to this particular issue my reaction...

By the way, probably the cleanest way to handle eliminating the character is to add it to a custom sentencepiece normalization which then gets baked into the .spm and requires...

Every case I see in https://github.com/ugermann/ssplit-cpp/blob/master/src/command/ssplit_main.cpp presumes the entire text to split is already in RAM or memory mapped. With that assumption, we've already lost the battle for bounded memory...

I thought wasm at least was living in `-fno-exceptions` territory where we can't throw and catch at all? (And even native code stuffed into the browser is compiled `-fno-exceptions`). Which...

Some of the `ABORT_IF` we can just keep IMHO. The biggest one is where input breaks stuff like an edge case in HTML.

Now low priority. This works fine in the production C++ API as is, and isn't blocking a WASM prototype.

To be clear, this was only meant to return an error in WASM when provided ill-formed HTML by the extension. The extension should never do that because it must use...

Regarding how it is presented, see deliverable 1.3: https://github.com/browsermt/coordination/blob/master/docs/D1.3-Bergamot_User_interface_with_quality_estimation.pdf Regarding how numbers map to what confidence to show, @mfomicheva ?

You're going to need to be more specific about which colors to use and what to do if they clash with the background.