Aflah
Aflah
Also just noticed @onlyone2019 Your first plot does match the expected result doesn't it? Triton is faster, and the difference is also substantial enough. Asking this as you wrote "But...
> > > > > > > > I'm not using APEX too. But the code cannot run successfully. I encountered the same error message: `_layer_norm_fwd_fused() got an unexpected keyword...
Are you by any chance using Windows? I got this error when I tried to install it on Windows without realizing that Triton is not available for the same
@guillaumekln Is there any update on this? Does CTranslate2 now support such larger models as well without any quantization?
> See https://github.com/huggingface/trl#citation :) But how do I figure out who to list as authors and in which order etc.?
Hey @agrueneberg This issue is quite old also seems to be done since the link doesn't work anymore. Shouldn't it be closed then?
Oh alright thanks for the info @mikehardy
@mikehardy What needs to be done here? Should I add more info about adding TTS Engines in the device?
@luofuli Are there any other things missing which I should incorporate?
@danielhanchen Just to confirm is there no way to use multi GPU for training? It seems that there is some DDP stuff mentioned in the README but couldn't find anything...