alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

Report hardware spec and parameters that training works

Open kcchu opened this issue 1 year ago • 12 comments

It might be a good idea to share the hardware spec and parameters that you got the fine-tuning to work to get a sense of the hardware requirement.

kcchu avatar Mar 17 '23 14:03 kcchu

RTX 4080 16GB

System

RTX 4080 16GB Intel i7 13700 64GB RAM Ubuntu 22.04.2 LTS

Parameters Model size: 7B MICRO_BATCH_SIZE = 3 EPOCHS = 2

Peak VRAM usage is about 15.8GB which is almost out of memory. It may be more comfortable to set MICRO_BATCH_SIZE to 2.

The training completed in 5.5 hours, and the result model behaves similar to the published model.

kcchu avatar Mar 17 '23 14:03 kcchu

Hm, no that's not normal. My progress bar is moving. It should also show you some additional training information periodically. image

HideLord avatar Mar 17 '23 15:03 HideLord

I am thinking of getting GEFORCE RTX 3060, does anybody know or can guess can it be used for fine-tuning the model? Can it be estimated how long it will take?

georgihacker avatar Mar 17 '23 16:03 georgihacker

GEFORCE RTX 3060 Likely not enough memory. 3060 are 8 or 12gb.

apkuhar avatar Mar 17 '23 16:03 apkuhar

What are the GPU reqs to do fine-tuning on the smallest model?

georgihacker avatar Mar 17 '23 18:03 georgihacker

RTX 4070 Ti

Training for GPUs w/ < 16GB VRAM

System

  • RTX 4070Ti 12GB
  • Ryzen 9 7900x
  • 32 GB RAM
  • Linux Mint 21.1

Parameters

MICRO_BATCH_SIZE = 1
BATCH_SIZE = 32 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 2
LEARNING_RATE = 3e-4
CUTOFF_LEN = 256
LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = 2000

You may be able to set MICRO_BATCH_SIZE = 2 but have not tested yet.

Total training time was near 9.5hrs (not ideal) training_data

pyamin1878 avatar Mar 20 '23 00:03 pyamin1878

2 * RTX 3090 Ti takes about 4.5 hours for 3 epochs with default parameters. The vram usage is about 20 GB.

UranusSeven avatar Mar 28 '23 09:03 UranusSeven

@UranusSeven , am also running 2*3090, how did you force-distribute memory across both cards for fine-tuning? tried setting max-memory and updated the new finetune.py from git but no luck... still maxes out on card 0 . thx

alxfoster avatar Mar 29 '23 20:03 alxfoster

Here's what I did:

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun \
--nproc_per_node=2 \
--master_port=1234 \
finetune.py \
--base_model 'decapoda-research/llama-7b-hf' \
--data_path '/path/to/alpaca_data.json' \
--output_dir './lora-alpaca'

UranusSeven avatar Apr 01 '23 06:04 UranusSeven

@pyamin1878 i use ths same hardware as you ,however it is hard for me to train even though using your parameters. would you please show me your full parameters or give me some advice ? utputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.72 GiB total capacity; 9.35 GiB already allocated; 38.69 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 6%|███▉ | 200/3110 [35:11<8:32:00, 10.56s/it]

EveningLin avatar Apr 17 '23 10:04 EveningLin

@pyamin1878 i use ths same hardware as you ,however it is hard for me to train even though using your parameters. would you please show me your full parameters or give me some advice ? utputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 11.72 GiB total capacity; 9.35 GiB already allocated; 38.69 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 6%|███▉ | 200/3110 [35:11<8:32:00, 10.56s/it]

Can you show me your parameters, it looks like you got an OOM error during training is that correct?

pyamin1878 avatar Apr 18 '23 06:04 pyamin1878

@EveningLin Also you may need to make sure you are using bitsandbytes==0.37.2

check the issue here: https://github.com/TimDettmers/bitsandbytes/issues/324#issue-1670782993

pyamin1878 avatar Apr 18 '23 06:04 pyamin1878

@pyamin1878 Thank you very much I have been puzzled for several hours trying to figure out why I was getting an OOM error when saving the model. I have downgraded to 0.37.2 and it worked no problem. For reference I am using a RTX 3060 12 GB and the decapoda-research/llama-7b-hf model.

Nan-Do avatar May 21 '23 09:05 Nan-Do