ColossalAI
                                
                                 ColossalAI copied to clipboard
                                
                                    ColossalAI copied to clipboard
                            
                            
                            
                        Making large AI models cheaper, faster and more accessible
### π Describe the bug - θΏθ‘sh examples/train_sft.sh  - ζ₯ιδΏ‘ζ―ε¦δΈοΌ [04/19/23 15:25:30] INFO colossalai - colossalai - INFO: /home/jovyan/work/projects/Example/ColossalAI/venv/lib/python3.8/site-packages/colossalai/context/parallel_context.py:522 set_device INFO colossalai - colossalai - INFO: process rank 0...
### π Describe the bug I was trying to run: torchrun --standalone --nproc_per_node=2 train_dummy.py --strategy colossalai_zero2 under applications/Chat/examples, and got this error. I tried possible solutions mentioned in other previous...
### π Describe the bug Traceback (most recent call last): File "train_sft.py", line 175, in train(args) File "train_sft.py", line 146, in train train(args) File "train_sft.py", line 146, in train trainer.fit(logger=logger,...
### π Describe the bug After the Llama model is trained using Lora training method, the model can be saved normally. However, Lora's model parameters were not included in the...
## π Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A...
### π Describe the bug Hi colossalai, I am trying to use colossalai to fine-tune stable diffusion. In the code, optimizer is defined as GeminiAdamOptimizer. I used the following code...
### π Describe the bug CUDA_VISIBLE_DEVICES=6 python train.py Traceback (most recent call last): File "train.py", line 13, in from colossalai.utils.model.colo_init_context import ColoInitContext ModuleNotFoundError: No module named 'colossalai.utils.model.colo_init_context' ### Environment absl-py...
gemini plugin support shard checkpoint to avoid large checkpoint files.
### π Describe the bug **Describe the bug** docker.io/hpcaitech/colossalai:0.2.x (x > 0) report Colossal AI version 0.2.0 and contain non-release tagged code from >0.2.0 and
### Discussed in https://github.com/hpcaitech/ColossalAI/discussions/3606 Originally posted by **cryoco** April 19, 2023 I've seen 2 ray implementations of PPO in this repo, [#3195 ](https://github.com/hpcaitech/ColossalAI/pull/3195) and [#3309 ](https://github.com/hpcaitech/ColossalAI/pull/3309). The former makes the...