lit-llama
lit-llama copied to clipboard
OOM when fully finetuning llama-7B using the deepspeed strategy
I tried to fully finetune llama-7b with deepspeed based on the code https://github.com/Lightning-AI/lit-llama/blob/main/finetune/full.py.
I replaced the FSDPStrategy with DeepSpeedStrategy(offload_optimizer=True, offload_parameters=False, pin_memory=True, offload_optimizer_device='cpu'), but there would always be an OOM error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.28 GiB (GPU 0; 39.39 GiB total capacity; 31.38 GiB already allocated; 5.90 GiB free; 31.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF". micro_batch_size=1 I also tried to set "offload_parameters=True" but in vain.
My machine: 4 A100 GPUs (40g each), Ubuntu.
The "full.py" code could work on FSDPStrategy.
Can anyone help me fix this issue? Thanks!