ColossalAI issues

[BUG]: ImportError: cannot import name 'register_meta' from 'torch._meta_registrations'

2

### 🐛 Describe the bug When I run https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh, I encounter this import error. File "/XXX/ColossalAI/colossalai/fx/_meta_regist_13.py", line 2, in from torch._meta_registrations import register_meta ImportError: cannot import name 'register_meta' from 'torch._meta_registrations'...

ZhangMaoTai

bug

[BUG]: The training code of reward model may be wrong

10

### 🐛 Describe the bug I'm tring to train a reward model with [example](https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh), but after ten epochs training its eval result still get `dist=nan, acc=0`. Is there any wrong...

Luoyang144

bug

Set tokenizer in PPO

2

In training the PPO of ColossalChat, two models actor and critic are needed. Can these two models be different? For example, the critic uses the bert model, and the actor...

guijuzhejiang

[BUG]: setup.py excludes op_builder

### 🐛 Describe the bug The setup.py in main branch just excludes op_builders. ``` setup(name=package_name, version=version, packages=find_packages(exclude=( 'op_builder', 'benchmark', 'docker', 'tests', 'docs', 'examples', 'tests', 'scripts', 'requirements', '*.egg-info', )), ``` I'm...

yynil

bug

[FEATURE]: serving multiple models

4

### Describe the feature Currently, does Colossal-AI have support or ongoing work for deploying multiple models concurrently, possibly using the Ray framework? For context, I’m doing a course/research project related...

dlzou

enhancement

[BUG]: AttributeError: 'LlamaForCausalLM' object has no attribute 'module'

1

### 🐛 Describe the bug When I save model, have error: ``` Traceback (most recent call last): File "train_sft.py", line 190, in train(args) File "train_sft.py", line 160, in train trainer.save_model(path=args.save_path,...

mynewstart

bug

[FEATURE]: Is hybrid parallelism supported in GPT demo?

2

### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?

nrailg

enhancement

[BUG]: Have stable diffifusion model train support tensor or pipeline parallel?

1

### 🐛 Describe the bug when I read colosalAI parallel doc，it say： we need modify torch.nn.Linear to col_nn.Linear. But In stable diffusion model code, I found model use torch.nn.Linear now...

qiuyang163

bug

[BUG]: ImportError: cannot import name 'ColoInitContext' from 'colossalai.zero'

10

### 🐛 Describe the bug When I run examples/single_node/train_sft.sh, I meet this bug I have tried various methods, but this bug still exists. colossalai: 0.2.8 ### Environment _No response_

sharejing

bug

[devops] fix chat ci

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

ver217

bug

DevOps

chatgpt

ColossalAI
ColossalAI copied to clipboard

Metadata

[BUG]: ImportError: cannot import name 'register_meta' from 'torch._meta_registrations'

[BUG]: The training code of reward model may be wrong

Set tokenizer in PPO

[BUG]: setup.py excludes op_builder

[FEATURE]: serving multiple models

[BUG]: AttributeError: 'LlamaForCausalLM' object has no attribute 'module'

[FEATURE]: Is hybrid parallelism supported in GPT demo?

[BUG]: Have stable diffifusion model train support tensor or pipeline parallel?

[BUG]: ImportError: cannot import name 'ColoInitContext' from 'colossalai.zero'

[devops] fix chat ci

← Metadata

Owner

Metadata

ColossalAI ColossalAI copied to clipboard

Metadata

← Metadata

Owner

Metadata

ColossalAI
ColossalAI copied to clipboard