ColossalAI
ColossalAI copied to clipboard
Making large AI models cheaper, faster and more accessible
### π Describe the bug When I run https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh, I encounter this import error. File "/XXX/ColossalAI/colossalai/fx/_meta_regist_13.py", line 2, in from torch._meta_registrations import register_meta ImportError: cannot import name 'register_meta' from 'torch._meta_registrations'...
### π Describe the bug I'm tring to train a reward model with [example](https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh), but after ten epochs training its eval result still get `dist=nan, acc=0`. Is there any wrong...
In training the PPO of ColossalChat, two models actor and critic are needed. Can these two models be different? For example, the critic uses the bert model, and the actor...
### π Describe the bug The setup.py in main branch just excludes op_builders. ``` setup(name=package_name, version=version, packages=find_packages(exclude=( 'op_builder', 'benchmark', 'docker', 'tests', 'docs', 'examples', 'tests', 'scripts', 'requirements', '*.egg-info', )), ``` I'm...
### Describe the feature Currently, does Colossal-AI have support or ongoing work for deploying multiple models concurrently, possibly using the Ray framework? For context, Iβm doing a course/research project related...
### π Describe the bug When I save model, have error: ``` Traceback (most recent call last): File "train_sft.py", line 190, in train(args) File "train_sft.py", line 160, in train trainer.save_model(path=args.save_path,...
### Describe the feature I found only DP and ZeRO strategy supports in `ColossalAI/applications/Chat/examples`, is hybrid parallelism (like PP / Megatron) supported?
### π Describe the bug when I read colosalAI parallel docοΌit sayοΌ we need modify torch.nn.Linear to col_nn.Linear. But In stable diffusion model code, I found model use torch.nn.Linear now...
### π Describe the bug When I run examples/single_node/train_sft.sh, I meet this bug I have tried various methods, but this bug still exists. colossalai: 0.2.8 ### Environment _No response_
## π Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...