ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### Is there an existing issue for this bug? - [x] I have searched the existing issues ### The bug has not been fixed in the latest main branch -...
I've noticed that the latest version of ColossalAI does not support 2D, 2.5D, and 3D tensor parallelism. I would like to know, according to ColossalAI's roadmap, when Shardformer will support...
@FrankLeeeee @gothicx @tiansiyuan @jeffra Does ColossalAI support training Flux model? For example, if I'm using a LoRA paradigm and need to redefine the processor within Flux, is this training method...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Describe the feature Add more training models and RLHF algorithms for the branch `grpo-latest`.
### Is there an existing issue for this bug? - [x] I have searched the existing issues ### The bug has not been fixed in the latest main branch -...
### Describe the feature When using CPU offload, setting master_weights=False in both GeminiPlugin and LowLevelZeroPlugin can reduce GPU memory usage and improve speed. Does HybridParallelPlugin also support this feature?
## Description Addresses issue #6349 where multi-node training gets stuck during distributed initialization when using torchrun in Kubernetes. ## Root Cause - Missing rendezvous backend configuration in torchrun - No...
## 📝 What does this PR do? fix a broken link - I hope this is the right new location