ColossalAI issues

### 🐛 Describe the bug While boosting the model using the `torch_fsdp` plugin and `LazyInitContext`, a RecursionError occurred: `RecursionError: maximum recursion depth exceeded` script: ``` from modeling_phi import PhiDecoderLayer, PhiForCausalLM...

airlsyn

bug

[FEATURE]: Is there any plan to support pure bf16 training for `GeminiDDP`?

2

### Describe the feature [Llama-2](https://github.com/facebookresearch/llama) has made `fsdp` + `bf16` training as their default training setting. The memory occupied by the copy of fp32 optimizer state and fp32 model parameters...

HAOCHENYE

enhancement

Adapt temperature processing logic

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

isky-cd

[BUG]: Running ColossalAI in H800 with torch 2.0

29

### 🐛 Describe the bug I am running example codes show in https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/gpt/experiments/auto_parallel with Pytorch 2.0 (because I need to deploy colossal in H800 which needs cuda at least 12.0...

wxthu

bug

PurvangL

bug

[Shardformer] Add Parallel output for shardformer models

Support parallel output function for shardformer models.

wangbluo

[Inference] Support the logic related to ignoring EOS token

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

isky-cd

ColossalAI
ColossalAI copied to clipboard

Metadata

[shardformer] enable padding vocabulary size.

[BUG]: with Phi2 + torch_fsdp, got RecursionError: maximum recursion depth exceeded

[FEATURE]: Is there any plan to support pure bf16 training for `GeminiDDP`?

Adapt temperature processing logic

[BUG]: Running ColossalAI in H800 with torch 2.0

[Fix] Fix Inference Example, Tests, and Requirements

[zero]remove registered gradients hooks

[BUG]: llama2 hybrid_parallel or 3d giving None loss when using pp_size > 1

[Shardformer] Add Parallel output for shardformer models

[Inference] Support the logic related to ignoring EOS token

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard