ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### Describe the feature How to open activation checkpoint offload, anyone can help me solve this?
I trained Llama2-7B-chat on the Alpaca dataset, and when I set the batch size to 2 or 4, "INFO: Found overflow. Skip step. " appeared at each step of the...
### 🐛 Describe the bug When I enable the optimization options inside the gemini_auto plugin, I encounter errors, such as TypeError: GeminiPlugin.init() got an unexpected keyword argument 'enable_flash_attention'. ### Environment...
### 🐛 Describe the bug The current implementation of WarmupScheduler does not include the functionality to load the after_scheduler part of the parameters. This omission leads to a scenario where...
### 🐛 Describe the bug A100*80G*8卡的机器,batch_size=1,7B的llama-2模型,train_sft.py和train_reward_model.py都跑不起来 ### Environment You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations...
## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...
### Describe the feature I appreciate your great work of releasing [llama 2 model](https://github.com/hpcaitech/ColossalAI/tree/785802e809ccf26b3864ae811dc908ecdf601a70/applications/Colossal-LLaMA-2). When will Data Processing Toolkit be released?
### 🐛 Describe the bug I was trying to reproduce the benchmark results on https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/README.md which says: > DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance...
## 📝 What does this PR do? Added support for batch_encoding for to_device method based on Issue #4489 Fixes #4489