lit-llama OOM when fully finetuning llama-7B using the deepspeed strategy

OOM when fully finetuning llama-7B using the deepspeed strategy

Open richardsun-voyager opened this issue 1 year ago • 0 comments

I tried to fully finetune llama-7b with deepspeed based on the code https://github.com/Lightning-AI/lit-llama/blob/main/finetune/full.py.

I replaced the FSDPStrategy with DeepSpeedStrategy(offload_optimizer=True, offload_parameters=False, pin_memory=True, offload_optimizer_device='cpu'), but there would always be an OOM error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.28 GiB (GPU 0; 39.39 GiB total capacity; 31.38 GiB already allocated; 5.90 GiB free; 31.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF". micro_batch_size=1 I also tried to set "offload_parameters=True" but in vain.

My machine: 4 A100 GPUs (40g each), Ubuntu.

The "full.py" code could work on FSDPStrategy.

Can anyone help me fix this issue? Thanks!

Jun 19 '23 06:06 richardsun-voyager

lit-llama lit-llama copied to clipboard

OOM when fully finetuning llama-7B using the deepspeed strategy

lit-llama
lit-llama copied to clipboard