ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1072 ColossalAI issues
Sort by recently updated
recently updated
newest added

### Describe the feature how to run it with 1080ti/P40, namely CC is 6.1

enhancement

### 🐛 Describe the bug while i use the command: "colossalai run --nproc_per_node 1 --master_addr GPU001 --master_port 29505 --host GPU001 main.py", it's not working. but the command "colossalai run --nproc_per_node...

bug

Hi, I want to reproduce the training process but have no two datasets. Do you have plans to open source datasets? Thx. https://github.com/hpcaitech/ColossalAI/blob/638a07a7f9b504e6c9781e9aa2a9b6c5e9dc49ed/applications/Chat/examples/train_prompts.py#L208-L209

### 🐛 Describe the bug ``` colossalai run --nproc_per_node=4 train_sft.py \ > --pretrain "/data/chenhao/train/ColossalAI/to/llama-7b-hf/" \ > --model 'llama' \ > --strategy colossalai_zero2 \ > --log_interval 10 \ > --save_path "/data/chenhao/train/ColossalAI/Coati-7B"...

bug

### 🐛 Describe the bug ### Description The official docker images run the [TensorNVME](https://github.com/hpcaitech/TensorNVMe) install commands, however at runtime, executing `cd TensorNVMe && tensornvme check` (or running the training demos...

bug

### 🐛 Describe the bug ColossalAI/applications/Chat/examples$ sh train_sft.sh WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further...

bug

### 🐛 Describe the bug GPU: 8*A6000 CUDA Version: 11.7 Python Version: 3.8.10 colossalai Version: 0.2.8 when I train PPO by ``` torchrun --standalone --nproc_per_node=8 train_prompts.py \ --pretrain "decapoda-research/llama-7b-hf" \...

bug

### Describe the feature First of all, thank you so much for sharing your project! At present, I have a requirement, which is as follows: First, I have a database,...

enhancement

## 📌 Checklist before creating the PR - [ yes ] I have created an issue for this PR for traceability - [ yes ] The title follows the standard...

Run Build and Test
API

### 🐛 Describe the bug how can i use the ddp train in diffusion? i saw the train_ddp.yaml,but there is nothing different with the train_colossalai.yaml. how do i set the...

bug