ColossalAI issues

Question about Supervised instructs tuning in ColossalAI/applications/Chat/

12

### 🐛 Describe the bug when I set the lora_rank in example/train_sft.sh to 8, the bug happens as following: Traceback (most recent call last): File "/home/chaojiewang/NeurIPS_2023/Chatgpt/coati/train_sft.py", line 185, in train(args)...

chaojiewang94

bug

[BUG]: stable diffusion 2.0 finetune error

3

### 🐛 Describe the bug I adapt the [example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth) by replacing `export MODEL_NAME="CompVis/stable-diffusion-v1-4"` with `export MODEL_NAME="stabilityai/stable-diffusion-2"`, then run the script and got following error. ``` RuntimeError: false INTERNAL ASSERT FAILED...

Tengxu-Sun

bug

[DOC]: why all examples are running at one node?

2

### 📚 The doc issue is there any examples is running at multi node?

LarryZhangy

documentation

[BUG]: bug in training rm with ddp strategy with single machine multi-GPUs!

3

### 🐛 Describe the bug Code: ------------------------------------------------------------ torchrun --standalone --nproc_per_node=1 train_reward_model.py --dataset Dahoas/rm-static --subset ../../../datasets/Dahoas_rm-static --max_len 512 --model gpt2 --pretrain ../../../gpt2/gpt2-small --lora_rank 0 --max_epochs 1 --batch_size 1 --loss_fn log_sig --test...

xHansonx

bug

[BUG]: PyTorch version mistach ？so i updata pytorch or cuda ?

1

### 🐛 Describe the bug (ColossalAI-Chat) tt@visiondev-SYS-4029GP-TRT:/data3/samba_css/chatgpt/ColossalAI/applications/Chat/examples$ colossalai check -i /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::eye.m_out(int n, int...

chensisi0730

bug

[BUG]: LlamaRM model has no attribute 'resize_token_embeddings'

8

### 🐛 Describe the bug LlamaRM is not a huggingface transformer module but LoraModule, while llamaRM.model is a huggingface transformer model. So LlamaRm has no function "resize_token_embeddings" but LlamaRM.model has....

lierik

bug

Can not train llama-7b due to OOM on 40GA100

48

#### GPU 40G*A100*8 I want to train the 7B model of Llama on 40GA100, but it prompts that there is not enough GPU memory. The training command is: `torchrun --standalone...

lurenlym

[BUG]: [extension] Compiling or loading the JIT-built cpu_adam kernel during runtime now

4

### 🐛 Describe the bug when i use the "colossalai_zero" strategy to train the RM model, it will spend a lot of time to load optimizer . I am very...

HaixHan

bug

[BUG]: AttributeError: 'LlamaActor' object has no attribute 'resize_token_embeddings'

2

### 🐛 Describe the bug When run the Stage3 code https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_prompts.py using LLaMA, this bug is encountered in line 137: ` tokenizer = prepare_llama_tokenizer_and_embedding(tokenizer, actor)` The detailed of this bug:...

ZhangMaoTai

bug

[BUG]: training GPT2-S using a single card on colab, AssertionError: You should use `zero_ddp_wrapper` first

3

### 🐛 Describe the bug when training GPT2-S using a single card on colab, `!torchrun --standalone --nproc_per_node 1 benchmark_gpt_dummy.py --model s --strategy colossalai_gemini_cpu --experience_batch_size 1 --train_batch_size 1` meetting a bug...

LivinLuo1993

bug

ColossalAI
ColossalAI copied to clipboard

Metadata

Question about Supervised instructs tuning in ColossalAI/applications/Chat/

[BUG]: stable diffusion 2.0 finetune error

[DOC]: why all examples are running at one node?

[BUG]: bug in training rm with ddp strategy with single machine multi-GPUs!

[BUG]: PyTorch version mistach ？so i updata pytorch or cuda ?

[BUG]: LlamaRM model has no attribute 'resize_token_embeddings'

Can not train llama-7b due to OOM on 40GA100

[BUG]: [extension] Compiling or loading the JIT-built cpu_adam kernel during runtime now

[BUG]: AttributeError: 'LlamaActor' object has no attribute 'resize_token_embeddings'

[BUG]: training GPT2-S using a single card on colab, AssertionError: You should use `zero_ddp_wrapper` first

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard