Kenneth Heafield
Kenneth Heafield
Yes, my thumbs up on Frank's comment stands.
I vote for no special handling of blank lines. 
I still contend that there's nothing different in these cases. If you're feeding long sentences into backtranslation, it's your fault for not cleaning it up before. Because otherwise you'll have...
I would support warning once per process.
@frankseide I'm worried about a user who remembers to set a max length, passes `--max-length` and is confused why it doesn't do anything. But also not that strongly opinionated on...
If I'm allowed to assume indices is 1x1x...x (axis) x1x1... and consecutive memory then ```C++ void Select(Tensor out, const Tensor in, const Tensor indices, int axis) { matchOrAbort(indices->type()); functional::Shape outShape...
If you're thinking about a Bergamot context, keep in mind that we need to run on different SIMD widths. And the weight storage format depends on SIMD width.
@ugermann To expand upon "the weight storage format depends on SIMD width" that means the representation in RAM of the weights depends on whether the CPU supports SSSE3, AVX2, or...
To clarify intgemm currently expects parameter matrices to be a multiple of 64 (inner dimension) x 8 (outputs). Retraining alone will not help. Configuring a multiple of that will. But...
Would you accept making the default build type `RelWithDebInfo` while returning `Release` to its former glory?