ColossalAI issues

[BUG]: import error

1

### 🐛 Describe the bug when i run the example in your tutorials (basic/colotensor), I met some problems. Traceback (most recent call last): File "colossalai-study/run_dist.py", line 8, in from colossalai.testing...

yingtongxiong

bug

[BUG]: how can i fine-tuning the glm-130b based on colossal-ai

1

### 🐛 Describe the bug hi, how can i fine-tuning the glm-130b model based on colossal-ai? glm-130b: https://keg.cs.tsinghua.edu.cn/glm-130b/zh/posts/glm-130b/ ### Environment _No response_

lyzKF

bug

Is the BLOOM model supported? How to perform task tuning like medical model with LLaMA33B？

1

delltower

[BUG]: CUDA out of memory. Tried to allocate 25.10 GiB

2

### 🐛 Describe the bug I get `CUDA out of memory. Tried to allocate 25.10 GiB` when run `train_sft.sh`, I t need 25.1GB, and My GPU is V100 and memory...

Tian14267

bug

[BUG]: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [1, 1, 512, 128]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

5

### 🐛 Describe the bug no ### Environment _No response_

liulihui123

bug

[BUG]: Regarding the supervised instructs tuning for Coati

3

### 🐛 Describe the bug I executed the training command of supervised instructs tuning for the Coati following the instruction in the README.md. It raised the error related to NCCL...

mynamedaike

bug

[BUG]: CUDA out of memory

5

### 🐛 Describe the bug tried to run train_sft.sh with error: OOM orch.cuda.OutOfMemoryError: C**UDA out of memory. Tried to allocate 1**72.00 MiB (GPU 0; 23.68 GiB total capacity; 18.08 GiB...

janglichao

bug

[FEATURE]: please add 8-bits NLP model training mode.

2

### Describe the feature Currently FP16 support can only make it possible for training models smaller than 2B in one graphic card with 24gb ram. However the main stream useful...

yynil

enhancement

Add two features which supports training PPO in one graphic card for large model and ChatGLM-6B model support

5

## 📌 Checklist before creating the PR - [ x] I have created an issue for this PR for traceability - [ x] The title follows the standard format: `[doc/gemini/tensor/...]:...

yynil

[DOC]: 说明文档太少了啊，对新手不友好

2

### 📚 The doc issue 希望官方能优化文档，提供较为详细的部署训练步骤。

infoBrainSys

documentation

ColossalAI
ColossalAI copied to clipboard

Metadata

[BUG]: import error

[BUG]: how can i fine-tuning the glm-130b based on colossal-ai

Is the BLOOM model supported? How to perform task tuning like medical model with LLaMA33B？

[BUG]: CUDA out of memory. Tried to allocate 25.10 GiB

[BUG]: Regarding the supervised instructs tuning for Coati

[BUG]: CUDA out of memory

[FEATURE]: please add 8-bits NLP model training mode.

Add two features which supports training PPO in one graphic card for large model and ChatGLM-6B model support

[DOC]: 说明文档太少了啊，对新手不友好

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard