Revo

Results 5 comments of Revo

will multi-node MPI trianing be faster than 8 gpus + NCCL?

@emjotde 1. I had tried the case "only overwrite the target token". I chosed using attention matrix to do replacement. ( En->Fr, En : I played `FOOTBALL,` Fr : j'ai...

@emjotde Yes. It is happening for all data-sets`(WMT17 zh-en, WMT14 de-en, CCMT2020 zh-en)`, here i show you my older version run.me script and its training log. P.S. : Actually i...

> We are interested by this functionality. @iandewancker what is the status on your side, do you have some spare time to work on this ? > > @AshBT note...

I use 1080 Ti for training 7 sec to 12 sec samples only takes me 1.5 sec/step. Maybe you didn't use your GPU?