ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Making large AI models cheaper, faster and more accessible

Results 1072 ColossalAI issues
Sort by recently updated
recently updated
newest added

### πŸ› Describe the bug When I run https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh, I encounter this import error. File "/XXX/ColossalAI/colossalai/fx/_meta_regist_13.py", line 2, in from torch._meta_registrations import register_meta ImportError: cannot import name 'register_meta' from 'torch._meta_registrations'...

bug

### πŸ› Describe the bug I'm tring to train a reward model with [example](https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh), but after ten epochs training its eval result still get `dist=nan, acc=0`. Is there any wrong...

bug

In training the PPO of ColossalChat, two models actor and critic are needed. Can these two models be different? For example, the critic uses the bert model, and the actor...

### πŸ› Describe the bug The setup.py in main branch just excludes op_builders. ``` setup(name=package_name, version=version, packages=find_packages(exclude=( 'op_builder', 'benchmark', 'docker', 'tests', 'docs', 'examples', 'tests', 'scripts', 'requirements', '*.egg-info', )), ``` I'm...

bug

### Describe the feature Currently, does Colossal-AI have support or ongoing work for deploying multiple models concurrently, possibly using the Ray framework? For context, I’m doing a course/research project related...

enhancement

### πŸ› Describe the bug When I save model, have error: ``` Traceback (most recent call last): File "train_sft.py", line 190, in train(args) File "train_sft.py", line 160, in train trainer.save_model(path=args.save_path,...

bug

### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?

enhancement

### πŸ› Describe the bug when I read colosalAI parallel doc,it say: we need modify torch.nn.Linear to col_nn.Linear. But In stable diffusion model code, I found model use torch.nn.Linear now...

bug

### πŸ› Describe the bug When I run examples/single_node/train_sft.sh, I meet this bug I have tried various methods, but this bug still exists. colossalai: 0.2.8 ### Environment _No response_

bug

## πŸ“Œ Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...

bug
DevOps
chatgpt