Alham Fikri Aji

Results 6 comments of Alham Fikri Aji

Are you by any chance, training a quantized model from scratch? One option is to train a normal model first, then activate the quantization. alternatively, not using --quantize-biases true should...

This dataset is a bit noisy at the moment, aside from having inconsistent labeling (numeric vs string), some data has no labels at all. I've sent a PR to that...

Some experiments: Originally we cannot train transformer on async SGD (0.0 BLEU). But if we assume that the average words per batch in sync SGD is 4x larger compared to...

**What is the right way to specify this on the command line** Currently we can set --batch-normal-words. I think the easiest way both for us and users is just scale...

will do it this week... so merge quantized training to master or to nick's branch?