Baibaifan

Results 11 issues of Baibaifan

**Describe the bug** There is a problem with asynchronous communication in zero stage2 by using `overlap_comm`. **To Reproduce** Steps to reproduce the behavior: Use deepspeed zero-2 on the hugging face...

bug
training

Hi! I tried to use peft model with Trainer, got the next error, **I used gradient_checkpointing**: ``` RuntimeError: Expected to mark a variable ready only once. This error is caused...

Add pipeline tutorials and group_sharded tutorials.

**Describe the bug** Open --use-mcore-models and --use-flash-attn, set --transformer-impl local, and do not use flash-attention. **To Reproduce** N/A **Expected behavior** N/A **Stack trace/logs** N/A **Environment (please complete the following information):**...

**Describe the bug** ![image](https://github.com/NVIDIA/Megatron-LM/assets/39549453/c1e3ea24-e371-4818-9d9f-b916bb34e0fe) As shown in the figure above, `shared_embedding` and other parameters are distinguished when building the `bucket`. When the `data_end_index` of the parameter before `shared_embedding` is not...

stale

**Describe the bug** As shown in the figure above, when calculating `w1` in this part, using `view` will cause element confusion. ![image](https://github.com/NVIDIA/Megatron-LM/assets/39549453/de68effb-5c77-498e-a656-ec99a45ca5b3) As shown in the figure above, it is...

stale

**Describe the bug** The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. `xxx\state_dict_tp_x.pt` is not `xxx/iter_0000001/mp_rank_00/model_optim_rng.pt`? **To Reproduce** - **Expected behavior** File format...

stale

# Problem description The file format output by `python examples/multimodal/clip_converter.py` does not match the file format required by `examples/multimodal/combine_mistral_clip.sh`. [bug issue](https://github.com/NVIDIA/Megatron-LM/issues/949) # After fix Under the original configuration, the conversion...

Support Packed_seq_params in Megatron-LM, just for testing.

### Problem: In Megatron-LM, there is a memory bottleneck when using the reset attention mask to construct long sequences. The following code: (_get_ltor_masks_and_position_ids) ![image](https://github.com/user-attachments/assets/42aa2748-7b20-4c00-a565-f66f780919a8) When a seq_len consists of multiple...