jetstream-pytorch icon indicating copy to clipboard operation
jetstream-pytorch copied to clipboard

Fix blockwise sharding

Open lsy323 opened this issue 7 months ago • 0 comments

  • Fix sharding yml file for proper megatron sharding
  • Add weight processing hook to pad blockwise quantized weight so that the sharded dimension is divisible by the number of partitions.

lsy323 avatar Jul 16 '24 02:07 lsy323