ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1091 ColossalAI issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug I understand that this error came out of flash attention software stack, but it seems there is no related issue except for #https://github.com/Dao-AILab/flash-attention/issues/590, therefore I...

bug

### Describe the feature Shardformer was originally developed based on transformers==4.33.0. In response to our users' needs, it needs to be upgraded to version 4.36.0. The main changes involve the...

enhancement

### Describe the feature can somebody give out the example of pretrian data format

enhancement

## 🚨 Issue number fixes #5534 ## 📝 What does this PR do? Added `FORCE_CUDA` environment variable support, to enable building extensions where a GPU device is not present but...

### 🐛 Describe the bug When no GPU device exists, such as CI or build nodes, no extensions can be built since `torch.cuda.is_available` checks for a device and not if...

bug

### 🐛 Describe the bug ## Description I implemented `Coati Lora` before parallel fine-tuning for LlaMA-7B, and found: - `Gemini` runs into _Error(s) in loading state_dict for GeminiCheckpointIO:_ and Train...

bug

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

### Describe the feature the dit model is the basic model to form sora , consider to suppport layer Parallel in ColossalAI ?

enhancement

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A...

### 🐛 Describe the bug In [this code block](https://github.com/hpcaitech/ColossalAI/blob/6df844b8c4946c734115b7e180b292888d857bc1/colossalai/checkpoint_io/utils.py#L560), when size mismatch occurs, no error message is printed. Fix: RuntimeError should be raise when `len(error_msgs) > 0` ### Environment _No...

bug