ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug pretrain llama2-7b can resume when using "zero2" plugin, but can not resume when using "gemini" plugin, when using "gemini" plugin, the resume process will stuck,...
### 🐛 Describe the bug question:When I trained using vit on the Imagenet-1k and Cifar-10 datasets, I repeatedly adjusted the parameter configuration according to the official vit configuration, but the...
### 🐛 Describe the bug --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 from colossalai.booster import Booster File ~/.local/lib/python3.11/site-packages/colossalai/booster/__init__.py:2 1 from .accelerator import Accelerator ---->...
### 📚 The doc issue May I ask what is the datasetset used to train the Colossal-Llama-2?
### 🐛 Describe the bug File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/llmodel/huap/ColossalAI/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py", line 133, in attention_forward cos,...
### Describe the feature We are excited to announce the addition of support for the qwen2 model in the ColossalAI framework. The qwen2 model is compatible with version 4.39.3 of...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### 🐛 Describe the bug I noticed that `from_torch_tensor` method of class `ColoParameter` and `ColoTensor` have been removed in PR #4479 ([`colossalai/tensor/colo_parameter.py`](https://github.com/hpcaitech/ColossalAI/pull/4479/files#diff-0d13ce3fae72d4ebe67bce9ef2441e4495a6aeee40c5532c30a985e79bc57cb6L66), [`colossalai/tensor/colo_tensor.py`](https://github.com/hpcaitech/ColossalAI/pull/4479/files#diff-0eee6bc157c59a4fb490823d53da0647d9793793bc4669f3e41146d3d99c7dd3L265)). But this method was still called under...
### 🐛 Describe the bug When using tensor parallelism, model parameters are sharded across GPUs to reduce its memory consumption and parallel execution. However, the optimizer still holds unsharded model...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...