alpaca-lora 12G VRAM issue

i m trying to just finetune with the example dataset with 7b model

the issue is when i started finetune its fine, i close all the backend process, everything went well until 200/1146 batches when it hit 200, about 0.5 epoch, the VRAM usage shoot up from 9.5G to 12G to OOM.

i m wondering is there any parameters i can change so it wont get OOM or 12G VRAM is simply not enough to tune 7b model?

May 05 '23 14:05 cxfcxf

You can set the PYTORCH_CUDA_ALLOC_CONF like this, it works for me

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64

May 06 '23 03:05 jieguangzhou

You can set the PYTORCH_CUDA_ALLOC_CONF like this, it works for me
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64 

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.99 GiB total capacity; 10.65 GiB already allocated; 0 bytes free; 11.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

it does reduce the allocation, last time it was trying to allocate 44.00MiB, not 16MiB, but still failed at

{'eval_loss': 0.903963565826416, 'eval_runtime': 147.6573, 'eval_samples_per_second': 13.545, 'eval_steps_per_second': 1.693, 'epoch': 0.51}
 17%|██████████████████                                                                                       | 200/1164 [2:18:45<10:50:40, 40.50s/itTraceback (most recent call last):
  File "/home/siegfried/alpaca-lora/finetune.py", line 283, in <module>
    fire.Fire(train)

try to reduce it further

May 06 '23 03:05 cxfcxf

hmm i doubt it can even be done with 12G vram

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:03:00.0 Off |                  N/A |
| 62%   72C    P2   318W / 350W |  23038MiB / 24576MiB |     64%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

i rent a 3090 and it used like 23G of VRAM...which let me passed 200/1164 check mark

May 06 '23 19:05 cxfcxf

For me as well at some point VRAM sauge increases. so better do it on a 24GB card.

May 06 '23 22:05 Oxi84

i m trying to just finetune with the example dataset with 7b model

the issue is when i started finetune its fine, i close all the backend process, everything went well until 200/1146 batches when it hit 200, about 0.5 epoch, the VRAM usage shoot up from 9.5G to 12G to OOM.

i m wondering is there any parameters i can change so it wont get OOM or 12G VRAM is simply not enough to tune 7b model?

I too am experiencing the same error, and in my research, I could not find anyone who trained with 12 GB vRAM. It seems that more powerful GPUs are required. I am considering renting a computer from Amazon to be able to use it, but I am not certain due to the unpredictability of the duration of the training process.

hmm i doubt it can even be done with 12G vram

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:03:00.0 Off |                  N/A |
| 62%   72C    P2   318W / 350W |  23038MiB / 24576MiB |     64%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

i rent a 3090 and it used like 23G of VRAM...which let me passed 200/1164 check marl

Additionally, may I ask where you rent graphics cards such as RTX 3090 directly?

May 07 '23 09:05 atillabasaran

runpod.io is 0.29 per hour for rtx 3090, vast.ai even cheaper. amazon is pretty expensive.

May 07 '23 10:05 Oxi84

runpod.io is 0.29 per hour for rtx 3090, vast.ai even cheaper. amazon is pretty expensive.

Do you happen to know if there is any difference compared to Amazon?

May 07 '23 11:05 atillabasaran

What do you mean difference? RTX 3090 is always the same card. If you want to pay more for the same performance then you rent from Amazon.

runpod and vast use juputer notebook just like google colab, but you have better cards. So it is very easy to use. Google colab gets me t4, 16 GB vram but only 12 GB of ram, so it is like 12 GB card. Anyways you probably cant train with a 16GB card as well.

May 07 '23 12:05 Oxi84

What do you mean difference? RTX 3090 is always the same card. If you want to pay more for the same performance then you rent from Amazon.

runpod and vast use juputer notebook just like google colab, but you have better cards. So it is very easy to use. Google colab gets me t4, 16 GB vram but only 12 GB of ram, so it is like 12 GB card. Anyways you probably cant train with a 16GB card as well.

Oh, I understand now thank you.

May 07 '23 12:05 atillabasaran

runpod.io

i rent from vast.io, and i trained my model for 10 hours, the pod was restarted, i lost all the data so i m training again, it has been 11 hours, still running

so it might be your mileage may vary thing, i would suggest buy a used 3090 for training, at least i m thinking of doing that

May 07 '23 15:05 cxfcxf

You can get 2000 hours of training on runpod.io, so it does not make sense to get a GPU just for 10-20 hours of training. For constant interface, it makes sense.

Vast.ai sometimes does does that, but that is very rare. It happened to me just a few times, but I run training usually for just 30 mins .

Runpod.io is more stable, with them this has never happened, so I almost always use them.

May 07 '23 15:05 Oxi84

so i found 200/1164 is actually a checkpoint, at that time peft is trying to save the lora to a checkpoint directory, and somehow this cause the VRAM to shoot up, in the end it failed to save, otherwise i guess people can just build every checkpoint, and resume from the checkpoint, maybe in that case 12G vram would work

May 07 '23 19:05 cxfcxf

alpaca-lora alpaca-lora copied to clipboard

12G VRAM issue

alpaca-lora
alpaca-lora copied to clipboard