ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug when i run the example in your tutorials (basic/colotensor), I met some problems. Traceback (most recent call last): File "colossalai-study/run_dist.py", line 8, in from colossalai.testing...
### 🐛 Describe the bug hi, how can i fine-tuning the glm-130b model based on colossal-ai? glm-130b: https://keg.cs.tsinghua.edu.cn/glm-130b/zh/posts/glm-130b/ ### Environment _No response_
### 🐛 Describe the bug I get `CUDA out of memory. Tried to allocate 25.10 GiB` when run `train_sft.sh`, I t need 25.1GB, and My GPU is V100 and memory...
### 🐛 Describe the bug no ### Environment _No response_
### 🐛 Describe the bug I executed the training command of supervised instructs tuning for the Coati following the instruction in the README.md. It raised the error related to NCCL...
### 🐛 Describe the bug tried to run train_sft.sh with error: OOM orch.cuda.OutOfMemoryError: C**UDA out of memory. Tried to allocate 1**72.00 MiB (GPU 0; 23.68 GiB total capacity; 18.08 GiB...
### Describe the feature Currently FP16 support can only make it possible for training models smaller than 2B in one graphic card with 24gb ram. However the main stream useful...
## 📌 Checklist before creating the PR - [ x] I have created an issue for this PR for traceability - [ x] The title follows the standard format: `[doc/gemini/tensor/...]:...