ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain bloom-560m --model bloom ``` size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current...
### 📚 The doc issue How to implement supervised finetune in stage1? ColossalAl Can it be realized? What should I do specifically?
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
### 🐛 Describe the bug when I run [examples/language/gpt/gemini/run_gemini.sh](https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/run_gemini.sh) scripts base on official Image `hpcaitech/colossalai:0.2.5` just using single card, everything is OK, But when I set GPU_NUM=2 by add the...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
## Overview Current implementation is not tested on many models. We have to add large scale correctness verification. Wanna track the development progress? Take a look at proposal: https://github.com/hpcaitech/ColossalAI/discussions/3124 kanban:...
### 🐛 Describe the bug gpus info : 3 nodes , 4 gpus per node (GeForce RTX 2080 Ti) pp:3 tp:2 dp:2 I use train_test.py in project [ColossalAI-Example] ,and get...
### 🐛 Describe the bug It seems that the embedding weight don't assignment when I package the model with geminidpp. The model works when I init with from_pretrained function, but...