lit-llama
lit-llama copied to clipboard
Can not finetune, OOM on v100 with batchsize 6
I changed batch size to this:
# batch_size = 128
batch_size = 6
micro_batch_size = 2
gradient_accumulation_steps = batch_size // micro_batch_size
max_iters = 50000 * 3 // micro_batch_size
still get OOM, on v100 32GB , why?
Changing batch size will not change the memory requirements, since we are using gradient accumulation, but changing micro_batch_size will.
What happens is that forward / backward will be computed with micro_batch_size samples in input, and the gradients will be accumulated until the sum of micro batches reaches batch_size. At that point we call optimizer step.
I'm assuming you're already using 16-mixed precision right? Do you have more than on V100 so you can shard your model?
thanks, it seems work again now, didn't know is it bacause of i change this checkpoint = torch.load(pretrained_path, map_location='cpu')
btw, will lit-llama consider support multiple-lan support as well?
Sorry what do you mean by "multiple-lan support"?
mulit language support
Hi, I have 32GB RAM but it gives me this error
"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 7.82 GiB of which 119.06 MiB is free. Process 5405 has 517.03 MiB memory in use. Including non-PyTorch memory, this process has 6.20 GiB memory in use. Of the allocated memory 5.90 GiB is allocated by PyTorch, and 1.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
I solve it by CUDA_VISIBLE_DEVICES=1,2 python finetune/adapter.py --data_dir data/mydata/ --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b --out_dir data/mydata-finetuned