unsloth OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length

Open rycfung opened this issue 1 year ago • 1 comments

I'm using the unsloth colab notebook to finetune the unsloth/llama-3-8b-bnb-4bit model with data with a max context length of 18000. Whenever I kick off training, it always run out of memory. That doesn't seem to be the case with the yahma/alpaca example. Here's the error:

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 102 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
[<ipython-input-7-3d62c575fcfd>](https://localhost:8080/#) in <cell line: 1>()
----> 1 trainer_stats = trainer.train()

13 frames
[/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py](https://localhost:8080/#) in _convert_to_fp32(tensor)
    779 
    780     def _convert_to_fp32(tensor):
--> 781         return tensor.float()
    782 
    783     def _is_fp16_bf16_tensor(tensor):

OutOfMemoryError: CUDA out of memory. Tried to allocate 9.47 GiB. GPU 0 has a total capacity of 14.75 GiB of which 3.78 GiB is free. Process 2116 has 10.95 GiB memory in use. Of the allocated memory 10.79 GiB is allocated by PyTorch, and 23.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Is the longer context length the reason for this to run run out of memory? What's the recommendation in this case to make this fine-tuning job possible

May 14 '24 22:05 rycfung

Yes too long contexts will cause OOMs. According to our blog: https://unsloth.ai/blog/llama3, the max context length on Tesla T4s (16GB) is 10K ish

May 15 '24 19:05 danielhanchen

unsloth unsloth copied to clipboard

OOM issue when finetuning unsloth/llama-3-8b-bnb-4bit on Colab with T4 with 18000 context length

unsloth
unsloth copied to clipboard