lit-llama icon indicating copy to clipboard operation
lit-llama copied to clipboard

Can not finetune, OOM on v100 with batchsize 6

Open lucasjinreal opened this issue 2 years ago • 6 comments
trafficstars

I changed batch size to this:

# batch_size = 128
batch_size = 6
micro_batch_size = 2
gradient_accumulation_steps = batch_size // micro_batch_size
max_iters = 50000 * 3 // micro_batch_size

still get OOM, on v100 32GB , why?

lucasjinreal avatar May 18 '23 02:05 lucasjinreal

Changing batch size will not change the memory requirements, since we are using gradient accumulation, but changing micro_batch_size will.

What happens is that forward / backward will be computed with micro_batch_size samples in input, and the gradients will be accumulated until the sum of micro batches reaches batch_size. At that point we call optimizer step.

I'm assuming you're already using 16-mixed precision right? Do you have more than on V100 so you can shard your model?

lantiga avatar May 18 '23 07:05 lantiga

thanks, it seems work again now, didn't know is it bacause of i change this checkpoint = torch.load(pretrained_path, map_location='cpu')

btw, will lit-llama consider support multiple-lan support as well?

lucasjinreal avatar May 18 '23 07:05 lucasjinreal

Sorry what do you mean by "multiple-lan support"?

lantiga avatar May 18 '23 07:05 lantiga

mulit language support

lucasjinreal avatar May 18 '23 07:05 lucasjinreal

Hi, I have 32GB RAM but it gives me this error

"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 7.82 GiB of which 119.06 MiB is free. Process 5405 has 517.03 MiB memory in use. Including non-PyTorch memory, this process has 6.20 GiB memory in use. Of the allocated memory 5.90 GiB is allocated by PyTorch, and 1.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

ghezalahmad avatar Jun 05 '23 10:06 ghezalahmad

I solve it by CUDA_VISIBLE_DEVICES=1,2 python finetune/adapter.py --data_dir data/mydata/ --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b --out_dir data/mydata-finetuned

ghezalahmad avatar Jun 05 '23 10:06 ghezalahmad