ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1091 ColossalAI issues
Sort by recently updated
recently updated
newest added

## ๐Ÿ“Œ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

Hi, experts๏ผŒwhen will you update shardformer for transformers latest version(such as transformers=4.46) ?

## ๐Ÿ“Œ Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### ๐Ÿ› Describe the bug pp=2 tp=2 sp=1 zero_stage=0 [rank6]: File "/usr/local/lib/python3.10/dist-packages/colossalai/shardformer/modeling/llama.py", line...

bug

## ๐Ÿ“Œ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Describe the feature How can we support LoRA/QLoRA in Gemini or TorchFSDP plugin? If thereโ€™s documentation on this feature, it might encourage community contributions. Thanks a lot.

enhancement

### Describe the feature [rank0]: NotImplementedError: Auto policy for Gemma2ForCausalLM (transformers.models.gemma2.modeling_gemma2.Gemma2ForCausalLM) is not implemented can you please tell me how to support gemma2 for Tensor Parallelism? or do you have...

enhancement

## ๐Ÿ“Œ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### ๐Ÿ› Describe the bug When using the GeminiPlugin to train a model,...

bug

### Proposal @kuozhang brought up in #6101 that FP8 with TP should `all_reduce` a global amax history. However based on my understanding of the code for [creating amax history](https://github.com/NVIDIA/TransformerEngine/blob/7fb22c375804f77f4f95df3eab606c7bd3e80aed/transformer_engine/pytorch/ops/op.py#L215), it...

enhancement