Baibaifan
Baibaifan
**Describe the bug** There is a problem with asynchronous communication in zero stage2 by using `overlap_comm`. **To Reproduce** Steps to reproduce the behavior: Use deepspeed zero-2 on the hugging face...
Hi! I tried to use peft model with Trainer, got the next error, **I used gradient_checkpointing**: ``` RuntimeError: Expected to mark a variable ready only once. This error is caused...
Add pipeline tutorials and group_sharded tutorials.
**Describe the bug** Open --use-mcore-models and --use-flash-attn, set --transformer-impl local, and do not use flash-attention. **To Reproduce** N/A **Expected behavior** N/A **Stack trace/logs** N/A **Environment (please complete the following information):**...
**Describe the bug**  As shown in the figure above, `shared_embedding` and other parameters are distinguished when building the `bucket`. When the `data_end_index` of the parameter before `shared_embedding` is not...
**Describe the bug** As shown in the figure above, when calculating `w1` in this part, using `view` will cause element confusion.  As shown in the figure above, it is...
**Describe the bug** The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. `xxx\state_dict_tp_x.pt` is not `xxx/iter_0000001/mp_rank_00/model_optim_rng.pt`? **To Reproduce** - **Expected behavior** File format...
# Problem description The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. [bug issue](https://github.com/NVIDIA/Megatron-LM/issues/949) # After fix Under the original configuration, the conversion...
Support Packed_seq_params in Megatron-LM, just for testing.
### Problem: In Megatron-LM, there is a memory bottleneck when using the reset attention mask to construct long sequences. The following code: (_get_ltor_masks_and_position_ids)  When a seq_len consists of multiple...