ColossalAI
                                
                                
                                
                                    ColossalAI copied to clipboard
                            
                            
                            
                        Making large AI models cheaper, faster and more accessible
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### 🐛 Describe the bug The error happens in booster.backward(loss, optimizer), I used GeminiPlugin ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 2044) of binary: /opt/conda/envs/pytorch/bin/python ### Environment linux, cuda11.7 torch1.13.1
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Describe the feature When `TORCH_CUDA_ARCH_LIST` is set, allow gpu build to succeed by not searching for a device and resulting in ``` RuntimeError: No CUDA GPUs are available ```...
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
### 📚 The doc issue I created Japanese translated README.
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
## 📝 What does this PR do? Fix typo s/infered/inferred/ I think there is no need for an issue, but I can if it's really important :)
### 🐛 Describe the bug The entire training process and everything worked, then i got through installing bitsandbytes, but as i try to sample i get an error message. I've...