alpaca-lora Report hardware spec and parameters that training works

It might be a good idea to share the hardware spec and parameters that you got the fine-tuning to work to get a sense of the hardware requirement.

Mar 17 '23 14:03 kcchu

RTX 4080 16GB

System

RTX 4080 16GB Intel i7 13700 64GB RAM Ubuntu 22.04.2 LTS

Parameters Model size: 7B MICRO_BATCH_SIZE = 3 EPOCHS = 2

Peak VRAM usage is about 15.8GB which is almost out of memory. It may be more comfortable to set MICRO_BATCH_SIZE to 2.

The training completed in 5.5 hours, and the result model behaves similar to the published model.

Mar 17 '23 14:03 kcchu

Hm, no that's not normal. My progress bar is moving. It should also show you some additional training information periodically.

Mar 17 '23 15:03 HideLord

I am thinking of getting GEFORCE RTX 3060, does anybody know or can guess can it be used for fine-tuning the model? Can it be estimated how long it will take?

Mar 17 '23 16:03 georgihacker

GEFORCE RTX 3060 Likely not enough memory. 3060 are 8 or 12gb.

Mar 17 '23 16:03 apkuhar

What are the GPU reqs to do fine-tuning on the smallest model?

Mar 17 '23 18:03 georgihacker

RTX 4070 Ti

Training for GPUs w/ < 16GB VRAM

System

RTX 4070Ti 12GB
Ryzen 9 7900x
32 GB RAM
Linux Mint 21.1

Parameters

MICRO_BATCH_SIZE = 1
BATCH_SIZE = 32 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 2
LEARNING_RATE = 3e-4
CUTOFF_LEN = 256
LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = 2000

You may be able to set MICRO_BATCH_SIZE = 2 but have not tested yet.

Total training time was near 9.5hrs (not ideal) training_data

Mar 20 '23 00:03 pyamin1878

2 * RTX 3090 Ti takes about 4.5 hours for 3 epochs with default parameters. The vram usage is about 20 GB.

Mar 28 '23 09:03 UranusSeven

@UranusSeven , am also running 2*3090, how did you force-distribute memory across both cards for fine-tuning? tried setting max-memory and updated the new finetune.py from git but no luck... still maxes out on card 0 . thx

Mar 29 '23 20:03 alxfoster

Here's what I did:

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master_port=1234 \
finetune.py \
--base_model 'decapoda-research/llama-7b-hf' \
--data_path '/path/to/alpaca_data.json' \
--output_dir './lora-alpaca'

Apr 01 '23 06:04 UranusSeven

@pyamin1878 i use ths same hardware as you ,however it is hard for me to train even though using your parameters. would you please show me your full parameters or give me some advice ? utputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.72 GiB total capacity; 9.35 GiB already allocated; 38.69 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 6%|███▉ | 200/3110 [35:11<8:32:00, 10.56s/it]

Apr 17 '23 10:04 EveningLin

@pyamin1878 i use ths same hardware as you ,however it is hard for me to train even though using your parameters. would you please show me your full parameters or give me some advice ? utputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.72 GiB total capacity; 9.35 GiB already allocated; 38.69 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 6%|███▉ | 200/3110 [35:11<8:32:00, 10.56s/it]

Can you show me your parameters, it looks like you got an OOM error during training is that correct?

Apr 18 '23 06:04 pyamin1878

@EveningLin Also you may need to make sure you are using bitsandbytes==0.37.2

check the issue here: https://github.com/TimDettmers/bitsandbytes/issues/324#issue-1670782993

Apr 18 '23 06:04 pyamin1878

@pyamin1878 Thank you very much I have been puzzled for several hours trying to figure out why I was getting an OOM error when saving the model. I have downgraded to 0.37.2 and it worked no problem. For reference I am using a RTX 3060 12 GB and the decapoda-research/llama-7b-hf model.

May 21 '23 09:05 Nan-Do

alpaca-lora alpaca-lora copied to clipboard

Report hardware spec and parameters that training works

RTX 4080 16GB

RTX 4070 Ti

alpaca-lora
alpaca-lora copied to clipboard