jetstream-pytorch
jetstream-pytorch copied to clipboard
Fix blockwise sharding
- Fix sharding yml file for proper megatron sharding
- Add weight processing hook to pad blockwise quantized weight so that the sharded dimension is divisible by the number of partitions.