Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

Ongoing research training transformer models at scale

Results 294 Megatron-LM issues
Sort by recently updated
recently updated
newest added

**Your question** 1. I have a question about creating the `pp` groups when enabling `context_parallel_size > 1` and `encoder_tensor_parallel_size != tensor_parallel_size`. When enabling `context_parallel`, the input will be split symmetrically...

**Describe the bug** When using a Zarr distributed checkpoint and a distributed optimizer, each rank writes optimizer states according to ShardedTensor's flattened_range. The Zarr strategy uses synchronizers to ensure the...

https://github.com/NVIDIA/Megatron-LM/blob/54f1f78529cbc2b9cddad313e7f9d96ac0420a27/megatron/legacy/model/multiple_choice.py#L42