Vincent Nguyen comments

Results 123 comments of


Vincent Nguyen

Translation outputs differ with different batch sizes

If you want to be sure change the exit condition here: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/translate/beam_search.py#L192 replace self.beam_size by self.n_best In theory, with the new condition, you should have slightly better scores.

about post normalization

There should not be a convergence issue. Maybe the best is to submit a PR and we can have a look at your code. PS: just after the Transformer paper,...

about post normalization

at first sight looks good. can you give more info on your training / results ?

Large Performance Regression with FusedAdam

Just tested pytorch 1.13 optim.Adam( , fused=True) it is slower than fused=False. (test done with a large transformer training) diff is about 5% slower.

Support BART models for classification

I am seconding this. It would be great to implement Bert-like models with encoders only + classification head. More specifically if we can use pre-trained parser like this: https://ufal.mff.cuni.cz/udpipe/2/models it...

About the Arxiv benchmark and the paper

Like Marian, we have a mechanism to average models on the fly (average_decay option) that's why we don't save models so often, but I am curious to understand why the...

About the Arxiv benchmark and the paper

okay in fact there is an issue with the valid batch size (8 tokens when batch_type=tokens, wheras it used to be 8 sentences in the past - we need to...

About the Arxiv benchmark and the paper

Michael, When looking again at the config there is still a discrepancy that can justify the BLEU difference. When you set rsqrt in ONMT there is no linear increase from...

About the Arxiv benchmark and the paper

the wall time looks still high, were you able to run with a batch size of 5000 update 10 ? did you keep the log ?

About the Arxiv benchmark and the paper

Great thanks. It seems that Torchscript brings the 5-10% improvement on 1 GPU, but I am unsure about the big gap on 8 GPUs. We use torch.distributed as well so...