ColossalAI issues

[BUG]: python pretrain.py /home/test/anaconda3/envs/colossalai_0.3.6/lib/python3.8/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel") /home/test/anaconda3/envs/colossalai_0.3.6/lib/python3.8/site-packages/colossalai/initialize.py:48: UserWarning: `config` is deprecated and will be removed soon. warnings.warn("`config` is deprecated and will be removed soon.")

### 🐛 Describe the bug 运行示例llama2的pretrain.py时，会出现这样类似卡主的情况 ### Environment CUDA Version: V11.1.105 Python Version: Python 3.8.18 PyTorch Version: 2.0.0+cu117

liqwertyu

bug

[fix] fix typo s/muiti-node /multi-node etc.

2

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

digger-yu

[Auto Parallel]: Speed up intra-op plan generation by 44%

## 📌 Checklist before creating the PR - [X] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A...

stephankoe

[FEATURE]: Intergration with huggingface accelerate？

4

### Describe the feature Intergration with huggingface accelerate？

trotsky1997

enhancement

[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)

5

### Describe the feature A recent paper titled "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection" (https://arxiv.org/pdf/2403.03507.pdf) demonstrates a remarkable memory-efficient approach during the training of large language models (LLMs)....

airlsyn

enhancement

chore: fix typo

fix typo in python code

hugo-syn

[PROPOSAL]: Speed improvement of Intra-Op plan generation in ColossalAuto

### Proposal Generating an Inter-Op plan with ColossalAuto takes usually a 1-2 minutes when running `examples/tutorial/auto_parallel/auto_parallel_with_resnet.py`. Profiling with cProfile reveals that a large portion of this time is consumed by...

stephankoe

enhancement

No module named 'colossalai._C.cpu_adam_x86'

3

### Discussed in https://github.com/hpcaitech/ColossalAI/discussions/5381 Originally posted by **mackmake** February 13, 2024 Hi and thanks for your efficient library. I wanted to pretrain so I installed packages with CUDA_EXT=1. Then I...

MountainGG

[BUG]: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' in applications/Colossal-LLaMA-2

2

### 🐛 Describe the bug following: https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2 but get error: > Flash-attention enabled successfully > Model params: 6.28 B > Booster init max device memory: 38593.54 MB > Booster init...

zhpacer

bug

[hottfix] fix llama2 coloattention initialization, 3d plugin add switch to enable gather parallel output

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

flybird11111

ColossalAI
ColossalAI copied to clipboard

Metadata

[fix] fix typo s/muiti-node /multi-node etc.

[Auto Parallel]: Speed up intra-op plan generation by 44%

[FEATURE]: Intergration with huggingface accelerate？

[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)

chore: fix typo

[PROPOSAL]: Speed improvement of Intra-Op plan generation in ColossalAuto

No module named 'colossalai._C.cpu_adam_x86'

[BUG]: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' in applications/Colossal-LLaMA-2

[hottfix] fix llama2 coloattention initialization, 3d plugin add switch to enable gather parallel output

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard