Skiba Gleb

Results 1 comments of Skiba Gleb

I want to shard output embedding layer - I use same strategy as in Llama, but training stacked after first butch ` ColwiseParallel( input_layouts=Shard(1), output_layouts=Shard(-1) if loss_parallel else Replicate(), use_local_output=not...