ColossalAI issues

[BUG]: pretraing llama2 using "gemini" plugin, can not resume from saved checkpoints

1

### 🐛 Describe the bug pretrain llama2-7b can resume when using "zero2" plugin, but can not resume when using "gemini" plugin, when using "gemini" plugin, the resume process will stuck,...

jiejie1993

bug

[BUG]: The accuracy of vit is very low

2

### 🐛 Describe the bug question:When I trained using vit on the Imagenet-1k and Cifar-10 datasets, I repeatedly adjusted the parameter configuration according to the official vit configuration, but the...

fearless1007

bug

[BUG]: ValueError: mutable default <class 'colossalai.legacy.tensor.distspec._DistSpec'> for field dist_attr is not allowed: use default_factory

1

### 🐛 Describe the bug --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 from colossalai.booster import Booster File ~/.local/lib/python3.11/site-packages/colossalai/booster/__init__.py:2 1 from .accelerator import Accelerator ---->...

fangbrodie

bug

[DOC]: What is the datasetset used to train the Colossal-Llama-2?

1

### 📚 The doc issue May I ask what is the datasetset used to train the Colossal-Llama-2?

ello0211

documentation

TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'

4

### 🐛 Describe the bug File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/llmodel/miniconda3/envs/colossal/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/llmodel/huap/ColossalAI/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py", line 133, in attention_forward cos,...

alphanlp

bug

[FEATURE]: Support qwen2 model

### Describe the feature We are excited to announce the addition of support for the qwen2 model in the ColossalAI framework. The qwen2 model is compatible with version 4.39.3 of...

wangbluo

enhancement

[Inference] Optimized some scattered optimization points in the framework

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

isky-cd

[BUG]: AttributeError: type object 'ColoParameter' has no attribute 'from_torch_tensor' when run hybrid_parallel example

3

### 🐛 Describe the bug I noticed that `from_torch_tensor` method of class `ColoParameter` and `ColoTensor` have been removed in PR #4479 ([`colossalai/tensor/colo_parameter.py`](https://github.com/hpcaitech/ColossalAI/pull/4479/files#diff-0d13ce3fae72d4ebe67bce9ef2441e4495a6aeee40c5532c30a985e79bc57cb6L66), [`colossalai/tensor/colo_tensor.py`](https://github.com/hpcaitech/ColossalAI/pull/4479/files#diff-0eee6bc157c59a4fb490823d53da0647d9793793bc4669f3e41146d3d99c7dd3L265)). But this method was still called under...

ztorchan

bug

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

9

### 🐛 Describe the bug When using tensor parallelism, model parameters are sharded across GPUs to reduce its memory consumption and parallel execution. However, the optimizer still holds unsharded model...

insujang

bug

[Fix] Update MoeHybridParallelPlugin;

1

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

duanjunwen

ColossalAI
ColossalAI copied to clipboard

Metadata

[BUG]: pretraing llama2 using "gemini" plugin, can not resume from saved checkpoints

[BUG]: The accuracy of vit is very low

[BUG]: ValueError: mutable default <class 'colossalai.legacy.tensor.distspec._DistSpec'> for field dist_attr is not allowed: use default_factory

[DOC]: What is the datasetset used to train the Colossal-Llama-2?

TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'

[FEATURE]: Support qwen2 model

[Inference] Optimized some scattered optimization points in the framework

[BUG]: AttributeError: type object 'ColoParameter' has no attribute 'from_torch_tensor' when run hybrid_parallel example

[BUG]: HybridParallelOptimizer holds unsharded model parameters after sharding

[Fix] Update MoeHybridParallelPlugin;

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard