Alham Fikri Aji
Alham Fikri Aji
Are you by any chance, training a quantized model from scratch? One option is to train a normal model first, then activate the quantization. alternatively, not using --quantize-biases true should...
#self-assign
This dataset is a bit noisy at the moment, aside from having inconsistent labeling (numeric vs string), some data has no labels at all. I've sent a PR to that...
Some experiments: Originally we cannot train transformer on async SGD (0.0 BLEU). But if we assume that the average words per batch in sync SGD is 4x larger compared to...
**What is the right way to specify this on the command line** Currently we can set --batch-normal-words. I think the easiest way both for us and users is just scale...
will do it this week... so merge quantized training to master or to nick's branch?