ColossalAI issues

add download model from openMind_Hub

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

cookieyyds

update shardformer for transformers=4.46

Hi, experts，when will you update shardformer for transformers latest version(such as transformers=4.46) ?

duomicoding

[feature] support Gemma2Model for tensor parallem training

1

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

jing-4369

[BUG]: Llama3.1 HybridParallelPlugin train failed when pp_size>1

17

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug pp=2 tp=2 sp=1 zero_stage=0 [rank6]: File "/usr/local/lib/python3.10/dist-packages/colossalai/shardformer/modeling/llama.py", line...

cingtiye

bug

[hotfix] fix parameter shape checking

4

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

flybird11111

[FEATURE]: Lora/QLora in GeminiPlugin and TorchFSDP

1

### Describe the feature How can we support LoRA/QLoRA in Gemini or TorchFSDP plugin? If there’s documentation on this feature, it might encourage community contributions. Thanks a lot.

airlsyn

enhancement

[FEATURE]: support google/gemma-2-2b for Tensor Parallelism

### Describe the feature [rank0]: NotImplementedError: Auto policy for Gemma2ForCausalLM (transformers.models.gemma2.modeling_gemma2.Gemma2ForCausalLM) is not implemented can you please tell me how to support gemma2 for Tensor Parallelism? or do you have...

jing-4369

enhancement

[Colossalai-Ascend] Support llama2-7b, chatglm2-6b finetune and inference on NPU

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

duanjunwen

[BUG]: why duplicate PID appears on rank 0

3

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug When using the GeminiPlugin to train a model,...

airlsyn

bug

[PROPOSAL]: FP8 with block-wise amax

### Proposal @kuozhang brought up in #6101 that FP8 with TP should `all_reduce` a global amax history. However based on my understanding of the code for [creating amax history](https://github.com/NVIDIA/TransformerEngine/blob/7fb22c375804f77f4f95df3eab606c7bd3e80aed/transformer_engine/pytorch/ops/op.py#L215), it...

Edenzzzz

enhancement

ColossalAI
ColossalAI copied to clipboard

Metadata

add download model from openMind_Hub

update shardformer for transformers=4.46

[feature] support Gemma2Model for tensor parallem training

[BUG]: Llama3.1 HybridParallelPlugin train failed when pp_size>1

[hotfix] fix parameter shape checking

[FEATURE]: Lora/QLora in GeminiPlugin and TorchFSDP

[FEATURE]: support google/gemma-2-2b for Tensor Parallelism

[Colossalai-Ascend] Support llama2-7b, chatglm2-6b finetune and inference on NPU

[BUG]: why duplicate PID appears on rank 0

[PROPOSAL]: FP8 with block-wise amax

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard