Skiba Gleb
Results
1
comments of
Skiba Gleb
I want to shard output embedding layer - I use same strategy as in Llama, but training stacked after first butch ` ColwiseParallel( input_layouts=Shard(1), output_layouts=Shard(-1) if loss_parallel else Replicate(), use_local_output=not...