ColossalAI issues

[FEATURE]: How to open activation checkpoint offload, anyone can help me solve this?

### Describe the feature How to open activation checkpoint offload, anyone can help me solve this?

enhancement

INFO: Found overflow. Skip step.

I trained Llama2-7B-chat on the Alpaca dataset, and when I set the batch size to 2 or 4, "INFO: Found overflow. Skip step. " appeared at each step of the...

stephencurry-web

Gemini_auto error

1

### 🐛 Describe the bug When I enable the optimization options inside the gemini_auto plugin, I encounter errors, such as TypeError: GeminiPlugin.init() got an unexpected keyword argument 'enable_flash_attention'. ### Environment...

chensimian

bug

代码bugs太多了[BUG]:

44

### 🐛 Describe the bug 代码问题太多了，建议重新审核维护 ### Environment _No response_

wangmiaowei

bug

[BUG]: Missing Implementation for Loading after_scheduler Parameters

### 🐛 Describe the bug The current implementation of WarmupScheduler does not include the functionality to load the after_scheduler part of the parameters. This omission leads to a scenario where...

imgaojun

bug

torch.cuda.OutOfMemoryError: CUDA out of memory

### 🐛 Describe the bug A100*80G*8卡的机器，batch_size=1，7B的llama-2模型，train_sft.py和train_reward_model.py都跑不起来 ### Environment You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations...

cy565025164

bug

[inference] add gptq and smoothquant benchmark

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

Xu-Kai

[FEATURE]: Data Toolkit

4

### Describe the feature I appreciate your great work of releasing [llama 2 model](https://github.com/hpcaitech/ColossalAI/tree/785802e809ccf26b3864ae811dc908ecdf601a70/applications/Colossal-LLaMA-2). When will Data Processing Toolkit be released?

cliangyu

enhancement

[BUG]: ColossalChat performance reproduction FAILED

3

### 🐛 Describe the bug I was trying to reproduce the benchmark results on https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/README.md which says: > DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance...

wzj423

bug

Fix : Added BatchEncoding support for to_device method under trainer.utils

1

## 📝 What does this PR do? Added support for batch_encoding for to_device method based on Issue #4489 Fixes #4489

AdarshAcharya5

ColossalAI
ColossalAI copied to clipboard

Metadata

[FEATURE]: How to open activation checkpoint offload, anyone can help me solve this?

INFO: Found overflow. Skip step.

Gemini_auto error

代码bugs太多了[BUG]:

[BUG]: Missing Implementation for Loading after_scheduler Parameters

torch.cuda.OutOfMemoryError: CUDA out of memory

[inference] add gptq and smoothquant benchmark

[FEATURE]: Data Toolkit

[BUG]: ColossalChat performance reproduction FAILED

Fix : Added BatchEncoding support for to_device method under trainer.utils

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard