ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### 🐛 Describe the bug While boosting the model using the `torch_fsdp` plugin and `LazyInitContext`, a RecursionError occurred: `RecursionError: maximum recursion depth exceeded` script: ``` from modeling_phi import PhiDecoderLayer, PhiForCausalLM...
### Describe the feature [Llama-2](https://github.com/facebookresearch/llama) has made `fsdp` + `bf16` training as their default training setting. The memory occupied by the copy of fp32 optimizer state and fp32 model parameters...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### 🐛 Describe the bug I am running example codes show in https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/gpt/experiments/auto_parallel with Pytorch 2.0 (because I need to deploy colossal in H800 which needs cuda at least 12.0...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### 🐛 Describe the bug Hi, I am trying to run llama2 7B model on [yizhongw/self_instruct](https://huggingface.co/datasets/yizhongw/self_instruct) dataset. As title suggest, training with hybrid_parallel or 3d plugin giving None loss, but...
Support parallel output function for shardformer models.
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...