Kenneth Heafield issues

Results 90 issues of


                                            Kenneth Heafield

Deprecate non-standard BLEU scripts

multi-bleu.perl has been deprecated for years now because it encourages people to use non-standard tokenization. This repository contains another non-standard BLEU implementation that a user might not notice they are...

nrk.no does not translate most content

Some text is left untranslated "Welches Format hat Ihre Sendung?"~~ 0. Install extension 1.1.1buildid20220506.201912 with en UI laguage 1. Visit https://www.deutschepost.de/de/p/portoberater.html#/ 2. Click extensions translation button 3. Click letters 4....

enhancement

Tokenized BLEU considered harmful - Discussion on community-based process

https://github.com/huggingface/nlp/blob/7d1526dfeeb29248d832f1073192dbf03ad642da/metrics/bleu/bleu.py#L76 assumes the inputs are tokenized by the user. This is bad practice because the user's tokenizer is usually not the same as the one used by `mteval-v13a.pl`, the closest...

generic discussion

Metric discussion

Deprecation notice for get_ende_bleu.sh

This script is harmful because it propagates a non-standard way to compute BLEU that is not reflective of the WMT 2014 task. Entirely too many papers are submitted with BLEU...

cla: no

Consider oneDNN instead of MKL for SGEMM

https://github.com/oneapi-src/oneDNN/ aka MKLDNN aka DNNL now has better performance for MT-size matrices: https://github.com/apache/incubator-mxnet/issues/17980 . And it's open source. The same teams write the GEMM for MKL and oneDNN. Would be...

enhancement

question

help wanted

version 2.0

Fast students spending 10% of time in sentencepiece

Both @alvations and Intel have noticed SentencePiece is taking ~10% of inference time and wonder if this can be optimized. First thing to try would be updating the submodule from...

performance

Kenneth Heafield

Deprecate non-standard BLEU scripts

nrk.no does not translate most content

Tokenized BLEU considered harmful - Discussion on community-based process

Deprecation notice for get_ende_bleu.sh

Consider oneDNN instead of MKL for SGEMM

Don't redefine Release

Temporary file name in log ends with a null byte when training SentencePiece

Fast implementation of Select for most cases on CPU

FP16 support

Fast students spending 10% of time in sentencepiece