ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1072 ColossalAI issues
Sort by recently updated
recently updated
newest added

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

bug

### 🐛 Describe the bug The error happens in booster.backward(loss, optimizer), I used GeminiPlugin ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 2044) of binary: /opt/conda/envs/pytorch/bin/python ### Environment linux, cuda11.7 torch1.13.1

bug

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Describe the feature When `TORCH_CUDA_ARCH_LIST` is set, allow gpu build to succeed by not searching for a device and resulting in ``` RuntimeError: No CUDA GPUs are available ```...

enhancement

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

documentation

### 📚 The doc issue I created Japanese translated README.

documentation

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

## 📝 What does this PR do? Fix typo s/infered/inferred/ I think there is no need for an issue, but I can if it's really important :)

### 🐛 Describe the bug The entire training process and everything worked, then i got through installing bitsandbytes, but as i try to sample i get an error message. I've...

bug