Blue Space

Results 6 issues of Blue Space

### Describe the bug `compgen -g` command cause repeatable autosuggestions crash, tested on multiple machines. ### To Reproduce Steps to reproduce the behavior: 1. configure oh-my-zsh 2. add zsh-autosuggestions to...

bug

This PR combines multiple modifications. # QWen2.5 checkpoint saver bug fix Thanks for the efforts @uygnef contributed to #368 , we use the new saver for model loader and saver...

**Describe the bug** p2p communication order error and stuck when pp 2 and vpp 2 with remove pad **To Reproduce** When use `PP=2` and `VPP=2` with `config.variable_seq_lengths=True`, `config.batch_p2p_comm=True` and `config.overlap_p2p_comm=False`,...

stale

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix ep bug and try to add CI with 15B model, finding smaller...

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support lr scheduler in megatron ### High-Level Design Still got some difference with FSDP's...

status: need review

# dist_checkpointing stuck on communication with MoE models in distributed environment Qwen 3 30B Moe models got stuck on all_reduce communication with dist_checkpoint. When running with 32 GPUs, it takes...