alpaca-lora
alpaca-lora copied to clipboard
A100 80 G fine tune llama-65b-hf got CUDAout of Memory
when i strart train, it works fine,the parameters as follows
and the gpu usage is as follows:
but when complete 17% it exits,the logs is here:
What's your bitsandbytes version?
What's your bitsandbytes version?你的bitsandbytes版本是什么? Name: bitsandbytes Version: 0.38.1
What's your bitsandbytes version?你的bitsandbytes版本是什么? Name: bitsandbytes Version: 0.38.1
uninstall then try bitsandbytes==0.37.2 ?
What's your bitsandbytes version?你的bitsandbytes版本是什么? Name: bitsandbytes Version: 0.38.1
uninstall then try bitsandbytes==0.37.2 ?
I have use the default parameter to run fine tune again,I will look if it will fail or not,if fail then i will try your solution,thanks
What's your bitsandbytes version?你的bitsandbytes版本是什么? Name: bitsandbytes Version: 0.38.1
uninstall then try bitsandbytes==0.37.2 ?
I have use the default parameter to run fine tune again,I will look if it will fail or not,if fail then i will try your solution,thanks
it's a pity that cuda OOM again
bitsandbytes==0.37.2 failed? Use 2 x 80G
bitsandbytes==0.37.2 failed? Use 2 x 80G
Only one card A100 80G
I'm having the same issue working with the 7b, also on an a100 80gb, or which ever gpu. The stack trace is below:
Loading cached split indices for dataset at /root/.cache/huggingface/datasets/victor123___json/victor123--evol_instruct_70k-de37dd5750ecc166/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-f60fe2ec6b4cb6b4.arrow and /root/.cache/huggingface/datasets/victor123___json/victor123--evol_instruct_70k-de37dd5750ecc166/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-b5791538b539bc0d.arrow
0%| | 0/3186 [00:00<?, ?it/s]Traceback (most recent call last):
File "/root/alpaca-lora/finetune.py", line 283, in <module>
fire.Fire(train)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/alpaca-lora/finetune.py", line 273, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1929, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2709, in training_step
self.scaler.scale(loss).backward()
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 141, in backward
outputs = ctx.run_function(*detached_inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
return module(*inputs, output_attentions, None)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
hidden_states = self.mlp(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 157, in forward
return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py", line 320, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 417, in forward
output += torch.matmul(subA, state.subB)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 79.21 GiB total capacity; 76.04 GiB already allocated; 101.56 MiB free; 77.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/3186 [00:11<?, ?it/s]
I've tried both bitsandbytes 0.37.2 and the latest
upgrading transformers to the dev version pip install git+https://github.com/huggingface/transformers
seems to have resolved it for me.
upgrading transformers to the dev version
pip install git+https://github.com/huggingface/transformers
seems to have resolved it. thanks,this
bitsandbytes==0.37.2 failed? Use 2 x 80G
Thanks,bitsandbytes==0.37.2 maybe ok,now it gets over 17%
,“75%|███████▌ | 875/1164 [23:44:24<7:44:41” not failed
upgrading transformers to the dev version
pip install git+https://github.com/huggingface/transformers
seems to have resolved it. bitsandbytes==0.37.2 also works
I did both of the suggestions (had bnb 0.37.2 and latest git transformers) but still ran into the issue
hi, @elven2016. Do you have encounted a phenomenon that the loss value change to upward trend after one epoch(total 3-epoch) ? like:
transformers == 4.29.2 bitsandbytes==0.37.2 peft==0.3.0
Using multi-GPU fine tune llama-65b-hf still got CUDA out of memory.
Hi, the mid-way stop could be due to the sequence length of the data. It worked out for the first 16% because the sequence length was below the maximum but maybe at 17% there is a sequence that is the max length and it pushes above the memory limit. It happened to me in the past, you can inspect the data to see whether is this true.
@PeiqinSun Hello, I have encountered the same issue. The loss dramatically decreases after each epoch and then gradually increases. I have also observed this phenomenon in a paper. After conducting preliminary tests, I found that this stair-like descent does not significantly affect the model's performance. Additionally, I speculate that this may be related to the data. Do you have any latest findings or progress? I am very curious about this phenomenon but don't have any debugging clues. Many thanks!