ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug 运行示例llama2的pretrain.py时,会出现这样类似卡主的情况 ### Environment CUDA Version: V11.1.105 Python Version: Python 3.8.18 PyTorch Version: 2.0.0+cu117
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
## 📌 Checklist before creating the PR - [X] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### Describe the feature Intergration with huggingface accelerate?
### Describe the feature A recent paper titled "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection" (https://arxiv.org/pdf/2403.03507.pdf) demonstrates a remarkable memory-efficient approach during the training of large language models (LLMs)....
fix typo in python code
### Proposal Generating an Inter-Op plan with ColossalAuto takes usually a 1-2 minutes when running `examples/tutorial/auto_parallel/auto_parallel_with_resnet.py`. Profiling with cProfile reveals that a large portion of this time is consumed by...
### Discussed in https://github.com/hpcaitech/ColossalAI/discussions/5381 Originally posted by **mackmake** February 13, 2024 Hi and thanks for your efficient library. I wanted to pretrain so I installed packages with CUDA_EXT=1. Then I...
### 🐛 Describe the bug following: https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2 but get error: > Flash-attention enabled successfully > Model params: 6.28 B > Booster init max device memory: 38593.54 MB > Booster init...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...