ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
## ๐ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
Hi, experts๏ผwhen will you update shardformer for transformers latest version(such as transformers=4.46) ?
## ๐ Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
### Is there an existing issue for this bug? - [X] I have searched the existing issues ### ๐ Describe the bug pp=2 tp=2 sp=1 zero_stage=0 [rank6]: File "/usr/local/lib/python3.10/dist-packages/colossalai/shardformer/modeling/llama.py", line...
## ๐ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Describe the feature How can we support LoRA/QLoRA in Gemini or TorchFSDP plugin? If thereโs documentation on this feature, it might encourage community contributions. Thanks a lot.
### Describe the feature [rank0]: NotImplementedError: Auto policy for Gemma2ForCausalLM (transformers.models.gemma2.modeling_gemma2.Gemma2ForCausalLM) is not implemented can you please tell me how to support gemma2 for Tensor Parallelism? or do you have...
## ๐ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Is there an existing issue for this bug? - [X] I have searched the existing issues ### ๐ Describe the bug When using the GeminiPlugin to train a model,...
### Proposal @kuozhang brought up in #6101 that FP8 with TP should `all_reduce` a global amax history. However based on my understanding of the code for [creating amax history](https://github.com/NVIDIA/TransformerEngine/blob/7fb22c375804f77f4f95df3eab606c7bd3e80aed/transformer_engine/pytorch/ops/op.py#L215), it...