Revo
Revo
will multi-node MPI trianing be faster than 8 gpus + NCCL?
@emjotde 1. I had tried the case "only overwrite the target token". I chosed using attention matrix to do replacement. ( En->Fr, En : I played `FOOTBALL,` Fr : j'ai...
@emjotde Yes. It is happening for all data-sets`(WMT17 zh-en, WMT14 de-en, CCMT2020 zh-en)`, here i show you my older version run.me script and its training log. P.S. : Actually i...
> We are interested by this functionality. @iandewancker what is the status on your side, do you have some spare time to work on this ? > > @AshBT note...
I use 1080 Ti for training 7 sec to 12 sec samples only takes me 1.5 sec/step. Maybe you didn't use your GPU?