alpaca-lora
alpaca-lora copied to clipboard
12G VRAM issue
i m trying to just finetune with the example dataset with 7b model
the issue is when i started finetune its fine, i close all the backend process, everything went well until 200/1146 batches when it hit 200, about 0.5 epoch, the VRAM usage shoot up from 9.5G to 12G to OOM.
i m wondering is there any parameters i can change so it wont get OOM or 12G VRAM is simply not enough to tune 7b model?
You can set the PYTORCH_CUDA_ALLOC_CONF like this, it works for me
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64
You can set the PYTORCH_CUDA_ALLOC_CONF like this, it works for me
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:64
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 11.99 GiB total capacity; 10.65 GiB already allocated; 0 bytes free; 11.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
it does reduce the allocation, last time it was trying to allocate 44.00MiB, not 16MiB, but still failed at
{'eval_loss': 0.903963565826416, 'eval_runtime': 147.6573, 'eval_samples_per_second': 13.545, 'eval_steps_per_second': 1.693, 'epoch': 0.51}
17%|██████████████████ | 200/1164 [2:18:45<10:50:40, 40.50s/itTraceback (most recent call last):
File "/home/siegfried/alpaca-lora/finetune.py", line 283, in <module>
fire.Fire(train)
try to reduce it further
hmm i doubt it can even be done with 12G vram
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A |
| 62% 72C P2 318W / 350W | 23038MiB / 24576MiB | 64% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
i rent a 3090 and it used like 23G of VRAM...which let me passed 200/1164 check mark
For me as well at some point VRAM sauge increases. so better do it on a 24GB card.
i m trying to just finetune with the example dataset with 7b model
the issue is when i started finetune its fine, i close all the backend process, everything went well until 200/1146 batches when it hit 200, about 0.5 epoch, the VRAM usage shoot up from 9.5G to 12G to OOM.
i m wondering is there any parameters i can change so it wont get OOM or 12G VRAM is simply not enough to tune 7b model?
I too am experiencing the same error, and in my research, I could not find anyone who trained with 12 GB vRAM. It seems that more powerful GPUs are required. I am considering renting a computer from Amazon to be able to use it, but I am not certain due to the unpredictability of the duration of the training process.
hmm i doubt it can even be done with 12G vram
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A | | 62% 72C P2 318W / 350W | 23038MiB / 24576MiB | 64% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
i rent a 3090 and it used like 23G of VRAM...which let me passed 200/1164 check marl
Additionally, may I ask where you rent graphics cards such as RTX 3090 directly?
runpod.io is 0.29 per hour for rtx 3090, vast.ai even cheaper. amazon is pretty expensive.
runpod.io is 0.29 per hour for rtx 3090, vast.ai even cheaper. amazon is pretty expensive.
Do you happen to know if there is any difference compared to Amazon?
What do you mean difference? RTX 3090 is always the same card. If you want to pay more for the same performance then you rent from Amazon.
runpod and vast use juputer notebook just like google colab, but you have better cards. So it is very easy to use. Google colab gets me t4, 16 GB vram but only 12 GB of ram, so it is like 12 GB card. Anyways you probably cant train with a 16GB card as well.
What do you mean difference? RTX 3090 is always the same card. If you want to pay more for the same performance then you rent from Amazon.
runpod and vast use juputer notebook just like google colab, but you have better cards. So it is very easy to use. Google colab gets me t4, 16 GB vram but only 12 GB of ram, so it is like 12 GB card. Anyways you probably cant train with a 16GB card as well.
Oh, I understand now thank you.
runpod.io
i rent from vast.io, and i trained my model for 10 hours, the pod was restarted, i lost all the data so i m training again, it has been 11 hours, still running
so it might be your mileage may vary thing, i would suggest buy a used 3090 for training, at least i m thinking of doing that
You can get 2000 hours of training on runpod.io, so it does not make sense to get a GPU just for 10-20 hours of training. For constant interface, it makes sense.
Vast.ai sometimes does does that, but that is very rare. It happened to me just a few times, but I run training usually for just 30 mins .
Runpod.io is more stable, with them this has never happened, so I almost always use them.
so i found 200/1164 is actually a checkpoint, at that time peft is trying to save the lora to a checkpoint directory, and somehow this cause the VRAM to shoot up, in the end it failed to save, otherwise i guess people can just build every checkpoint, and resume from the checkpoint, maybe in that case 12G vram would work