ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### Describe the feature how to run it with 1080ti/P40, namely CC is 6.1
### 🐛 Describe the bug while i use the command: "colossalai run --nproc_per_node 1 --master_addr GPU001 --master_port 29505 --host GPU001 main.py", it's not working. but the command "colossalai run --nproc_per_node...
Hi, I want to reproduce the training process but have no two datasets. Do you have plans to open source datasets? Thx. https://github.com/hpcaitech/ColossalAI/blob/638a07a7f9b504e6c9781e9aa2a9b6c5e9dc49ed/applications/Chat/examples/train_prompts.py#L208-L209
### 🐛 Describe the bug ``` colossalai run --nproc_per_node=4 train_sft.py \ > --pretrain "/data/chenhao/train/ColossalAI/to/llama-7b-hf/" \ > --model 'llama' \ > --strategy colossalai_zero2 \ > --log_interval 10 \ > --save_path "/data/chenhao/train/ColossalAI/Coati-7B"...
### 🐛 Describe the bug ### Description The official docker images run the [TensorNVME](https://github.com/hpcaitech/TensorNVMe) install commands, however at runtime, executing `cd TensorNVMe && tensornvme check` (or running the training demos...
### 🐛 Describe the bug ColossalAI/applications/Chat/examples$ sh train_sft.sh WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further...
### 🐛 Describe the bug GPU: 8*A6000 CUDA Version: 11.7 Python Version: 3.8.10 colossalai Version: 0.2.8 when I train PPO by ``` torchrun --standalone --nproc_per_node=8 train_prompts.py \ --pretrain "decapoda-research/llama-7b-hf" \...
### Describe the feature First of all, thank you so much for sharing your project! At present, I have a requirement, which is as follows: First, I have a database,...
## 📌 Checklist before creating the PR - [ yes ] I have created an issue for this PR for traceability - [ yes ] The title follows the standard...
### 🐛 Describe the bug how can i use the ddp train in diffusion? i saw the train_ddp.yaml,but there is nothing different with the train_colossalai.yaml. how do i set the...