ColossalAI issues

[BUG]: loading OPT 66B model - CPU runs out of memory

2

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug I am trying to reproduce OPT-66B using 16xH100...

PurvangL

bug

[FEATURE]: Support SP+PP in Llama etc.

1

### Describe the feature Currently most models like Llama does not support SP together with PP. Please add support for this.

GuangyaoZhang

enhancement

shardformer

[BUG]: Colossal AI failed to load ChatGLM2

2

### Is there an existing issue for this bug? - [X] I have searched the existing issues ### 🐛 Describe the bug I failed to run ChatGLM model with ColossalAI...

hiprince

bug

[FEATURE]: Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM

### Describe the feature Please add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM

GuangyaoZhang

enhancement

shardformer

Use gemini plugin and LowLevelZero to run llama2_7b. In the pulgin in gemini, set the policy to static, shard_param_frac, offload_optim_frac, and offload_param_frac to 0.0, making gemini equal to zero2, and set stage to 2 in LowLevelZero. Using bf16 for training, and comparing the two plugins, we found that the GPU memory usage of gemini is higher than that of LowLevelZero. Why is this? In principle, gemini should save more GPU memory

2

JJGSBGQ

[FEATURE]: LoRA with sharded model

1

### Describe the feature Hi, when training big model like llama2-70b with lora, it will run into oom due to the unsharded model. It could help a lot if lora...

KaiLv69