Megatron-LM
Megatron-LM copied to clipboard
Ongoing research training transformer models at scale
Fix typos
**Your question** 1. I have a question about creating the `pp` groups when enabling `context_parallel_size > 1` and `encoder_tensor_parallel_size != tensor_parallel_size`. When enabling `context_parallel`, the input will be split symmetrically...
**Describe the bug** When using a Zarr distributed checkpoint and a distributed optimizer, each rank writes optimizer states according to ShardedTensor's flattened_range. The Zarr strategy uses synchronizers to ensure the...
https://github.com/NVIDIA/Megatron-LM/blob/54f1f78529cbc2b9cddad313e7f9d96ac0420a27/megatron/legacy/model/multiple_choice.py#L42