ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1091 ColossalAI issues
Sort by recently updated
recently updated
newest added

### Is there an existing issue for this bug? - [x] I have searched the existing issues ### The bug has not been fixed in the latest main branch -...

bug

I've noticed that the latest version of ColossalAI does not support 2D, 2.5D, and 3D tensor parallelism. I would like to know, according to ColossalAI's roadmap, when Shardformer will support...

@FrankLeeeee @gothicx @tiansiyuan @jeffra Does ColossalAI support training Flux model? For example, if I'm using a LoRA paradigm and need to redefine the processor within Flux, is this training method...

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Describe the feature Add more training models and RLHF algorithms for the branch `grpo-latest`.

enhancement

### Is there an existing issue for this bug? - [x] I have searched the existing issues ### The bug has not been fixed in the latest main branch -...

bug

### Describe the feature When using CPU offload, setting master_weights=False in both GeminiPlugin and LowLevelZeroPlugin can reduce GPU memory usage and improve speed. Does HybridParallelPlugin also support this feature?

enhancement

## Description Addresses issue #6349 where multi-node training gets stuck during distributed initialization when using torchrun in Kubernetes. ## Root Cause - Missing rendezvous backend configuration in torchrun - No...

## 📝 What does this PR do? fix a broken link - I hope this is the right new location