ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug I understand that this error came out of flash attention software stack, but it seems there is no related issue except for #https://github.com/Dao-AILab/flash-attention/issues/590, therefore I...
### Describe the feature Shardformer was originally developed based on transformers==4.33.0. In response to our users' needs, it needs to be upgraded to version 4.36.0. The main changes involve the...
### Describe the feature can somebody give out the example of pretrian data format
## 🚨 Issue number fixes #5534 ## 📝 What does this PR do? Added `FORCE_CUDA` environment variable support, to enable building extensions where a GPU device is not present but...
### 🐛 Describe the bug When no GPU device exists, such as CI or build nodes, no extensions can be built since `torch.cuda.is_available` checks for a device and not if...
### 🐛 Describe the bug ## Description I implemented `Coati Lora` before parallel fine-tuning for LlaMA-7B, and found: - `Gemini` runs into _Error(s) in loading state_dict for GeminiCheckpointIO:_ and Train...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Describe the feature the dit model is the basic model to form sora , consider to suppport layer Parallel in ColossalAI ?
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### 🐛 Describe the bug In [this code block](https://github.com/hpcaitech/ColossalAI/blob/6df844b8c4946c734115b7e180b292888d857bc1/colossalai/checkpoint_io/utils.py#L560), when size mismatch occurs, no error message is printed. Fix: RuntimeError should be raise when `len(error_msgs) > 0` ### Environment _No...