Boxiang Wang
Boxiang Wang
### 🐛 Describe the bug When testing [DeTr on Colossal-Example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/detr), I encountered an issue that model with only DDP in situations: 1. `LEARNING_RATE=1e-4`, `world_size=4` 2. `LEARNING_RATE=2e-4`, `world_size=8` 3. `LEARNING_RATE=1e-4`, `world_size=8`...
# What does this PR do ? Change microbatch calculator implementation into mcore. Related to MCore # 24 **Collection**: [Note which collection this PR will affect] # Jenkins CI The...
### Describe the feature Compared to vanilla PyTorch, Titans right now includes many unnecessary codes for example multiple files for MLP. We could provide common MLP, Attention ... modules for...
# What does this PR do ? Add Nemoron4 15b and 22b model 8k configs and Long Context Recipes for 16k and 64k along with it. **Collection**: [Note which collection...
### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug This CI is not working and we should...
# What does this PR do ? Add a one line overview of what this PR aims to accomplish. **Collection**: [Note which collection this PR will affect] # Changelog -...
# What does this PR do ? `always_save_nemo` option is not supported yet to work with model parallel. Added assertions to avoid future confusions. **Collection**: [Note which collection this PR...
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occassions. > An outdated branch is never blocking the merge of a PR. > Please reach...
# What does this PR do ? Add MCore FSDP2 support **Collection**: [Note which collection this PR will affect] # Changelog - Add specific line by line info of high...