ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug I am trying to reproduce OPT-66B using 16xH100...
### Describe the feature Currently most models like Llama does not support SP together with PP. Please add support for this.
### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug I failed to run ChatGLM model with ColossalAI...
### Describe the feature Please add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
### Describe the feature Hi, when training big model like llama2-70b with lora, it will run into oom due to the unsharded model. It could help a lot if lora...
After calling booster.backward(loss=loss, optimizer=optimizer), all gradients of model.module are None. Is there a way to access the gradients?
## 📝 What does this PR do? - Pytorch 2.3.0 added a `group` argument in `_object_to_tensor` function. Updated related use cases in pipeline communication.
1. Optimize the data path: from `List->CPU Tensor->List->rpc_param->GPU Tensor` to `List->rpc_param->GPU Tensor` 2. Wrap the async forward only once 3. Only rank0 Worker runs the sampler and returns the return...