ColossalAI issues

[BUG]: inference.py

8

### 🐛 Describe the bug python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain bloom-560m --model bloom ``` size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current...

JingxinLee

bug

I want to implement stage1 in chatGPT. What should I do？

1

### 📚 The doc issue How to implement supervised finetune in stage1？ ColossalAl Can it be realized？ What should I do specifically？

1a2cjitenfei

documentation

[tests] model zoo add torchaudio models

1

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

ver217

Run Build and Test

[test] added torchvision models to test model zoo

2

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

FrankLeeeee

Run Build and Test

[test] added transformers models to test model zoo

1

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

FrankLeeeee

Run Build and Test

[BUG]: GPT single node multi-card training occurred NCCL Error

2

### 🐛 Describe the bug when I run [examples/language/gpt/gemini/run_gemini.sh](https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/run_gemini.sh) scripts base on official Image `hpcaitech/colossalai:0.2.5` just using single card, everything is OK, But when I set GPU_NUM=2 by add the...

tianxin1860

bug

[chatgpt]Reward Model Training Process update

1

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

ht-zhou

chatgpt

[lazyinit] add correctness verification

2

## Overview Current implementation is not tested on many models. We have to add large scale correctness verification. Wanna track the development progress? Take a look at proposal: https://github.com/hpcaitech/ColossalAI/discussions/3124 kanban:...

ver217

lazyinit

[BUG]: load_checkpoint error

6

### 🐛 Describe the bug gpus info : 3 nodes , 4 gpus per node (GeForce RTX 2080 Ti) pp:3 tp:2 dp:2 I use train_test.py in project [ColossalAI-Example] ,and get...

readme2gh

bug

[BUG]: The embedding weight don't assignment when I used geminidpp

2

### 🐛 Describe the bug It seems that the embedding weight don't assignment when I package the model with geminidpp. The model works when I init with from_pretrained function, but...

TexasRangers86

bug

ColossalAI
ColossalAI copied to clipboard

Metadata

[BUG]: inference.py

I want to implement stage1 in chatGPT. What should I do？

[tests] model zoo add torchaudio models

[test] added torchvision models to test model zoo

[test] added transformers models to test model zoo

[BUG]: GPT single node multi-card training occurred NCCL Error

[chatgpt]Reward Model Training Process update

[lazyinit] add correctness verification

[BUG]: load_checkpoint error

[BUG]: The embedding weight don't assignment when I used geminidpp

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard