alpaca-lora CUDA out of memory : I am using Colab T4 GPU

!python generate.py

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Loading checkpoint shards: 100% 33/33 [01:13<00:00, 2.22s/it]

Instruction: Tell me about alpacas.

/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. warnings.warn( Traceback (most recent call last): File "/content/generate.py", line 150, in fire.Fire(main) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/content/generate.py", line 144, in main print("Response:", evaluate(instruction)) File "/content/generate.py", line 119, in evaluate generation_output = model.generate( File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1034, in generate outputs = self.base_model.generate(**kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1752, in generate return self.beam_search( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3091, in beam_search outputs = self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 922, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 672, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 382, in forward key_states = torch.cat([past_key_value[0], key_states], dim=2) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 6.81 MiB is free. Process 1298046 has 14.74 GiB memory in use. Of the allocated memory 13.56 GiB is allocated by PyTorch, and 309.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF [ ]

Nov 05 '23 04:11 anshumansinha16

I came across this error too,do you know how to solve it ^o^

Nov 05 '23 11:11 bingxinjiang03

No, I have not yet solved this issue.

Nov 05 '23 17:11 anshumansinha16

alpaca-lora alpaca-lora copied to clipboard

CUDA out of memory : I am using Colab T4 GPU

alpaca-lora
alpaca-lora copied to clipboard