ColossalAI
                                
                                 ColossalAI copied to clipboard
                                
                                    ColossalAI copied to clipboard
                            
                            
                            
                        [BUG]: On the eight-card A100, testing the 'examples/language/llama2' with the 'gemini_auto' plugin resulted in an 'out of memory' error."
🐛 Describe the bug
Here are my script,  it can run with hybrid_parallel plugin, but other plugins have the same error "out of memory"
torchrun --standalone --nproc_per_node 8 finetune.py 
--plugin "gemini_auto" 
--dataset "self_instruct" 
--model_path "Llama2-Chinese-7b-Chat" 
--task_name "finetuning" 
--batch_size 2 
--save_dir "output_test"
Environment
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 4; 79.21 GiB total capacity; 75.40 GiB already allocated; 1.74 GiB free; 76.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi, how about trying to set offload_optim_frac and offload_param_frac to 1.0?