ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1072 ColossalAI issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug when I set the lora_rank in example/train_sft.sh to 8, the bug happens as following: Traceback (most recent call last): File "/home/chaojiewang/NeurIPS_2023/Chatgpt/coati/train_sft.py", line 185, in train(args)...

bug

### 🐛 Describe the bug I adapt the [example](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth) by replacing `export MODEL_NAME="CompVis/stable-diffusion-v1-4"` with `export MODEL_NAME="stabilityai/stable-diffusion-2"`, then run the script and got following error. ``` RuntimeError: false INTERNAL ASSERT FAILED...

bug

### 📚 The doc issue is there any examples is running at multi node?

documentation

### 🐛 Describe the bug Code: ------------------------------------------------------------ torchrun --standalone --nproc_per_node=1 train_reward_model.py --dataset Dahoas/rm-static --subset ../../../datasets/Dahoas_rm-static --max_len 512 --model gpt2 --pretrain ../../../gpt2/gpt2-small --lora_rank 0 --max_epochs 1 --batch_size 1 --loss_fn log_sig --test...

bug

### 🐛 Describe the bug (ColossalAI-Chat) tt@visiondev-SYS-4029GP-TRT:/data3/samba_css/chatgpt/ColossalAI/applications/Chat/examples$ colossalai check -i /home/tt/anaconda3/envs/ColossalAI-Chat/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::eye.m_out(int n, int...

bug

### 🐛 Describe the bug LlamaRM is not a huggingface transformer module but LoraModule, while llamaRM.model is a huggingface transformer model. So LlamaRm has no function "resize_token_embeddings" but LlamaRM.model has....

bug

#### GPU 40G*A100*8 I want to train the 7B model of Llama on 40GA100, but it prompts that there is not enough GPU memory. The training command is: `torchrun --standalone...

### 🐛 Describe the bug when i use the "colossalai_zero" strategy to train the RM model, it will spend a lot of time to load optimizer . I am very...

bug

### 🐛 Describe the bug When run the Stage3 code https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_prompts.py using LLaMA, this bug is encountered in line 137: ` tokenizer = prepare_llama_tokenizer_and_embedding(tokenizer, actor)` The detailed of this bug:...

bug

### 🐛 Describe the bug when training GPT2-S using a single card on colab, `!torchrun --standalone --nproc_per_node 1 benchmark_gpt_dummy.py --model s --strategy colossalai_gemini_cpu --experience_batch_size 1 --train_batch_size 1` meetting a bug...

bug