text-generation-webui
text-generation-webui copied to clipboard
Crash with llava extension + --no-cache
Describe the bug
I'm trying to use the llava extension with my 12 GB RTX3060 card. It's working reasonably well, and I notice the VRAM usage idles at about 9.1 GB. When composing responses it quickly balloons to the max, and on some prompts aborts with an out of memory error.
I tried --no-cache to work around this, and it just crashes when starting to work on the response.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Include - -extensions llava --no-cache, load a picture and enter a prompt.
Screenshot
No response
Logs
Traceback (most recent call last):
File "/root/text-generation-webui/modules/callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/root/text-generation-webui/modules/text_generation.py", line 290, in generate_with_callback
shared.model.generate(**kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
outputs = self(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 527, in forward
position_ids = position_ids.view(-1, seq_length).long()
RuntimeError: shape '[-1, 380]' is invalid for input of size 381
Output generated in 4.74 seconds (0.21 tokens/s, 1 tokens, context 380, seed 608416594)
System Info
Ubuntu 22.04
RTX 3060
this model: wojtab_llava-13b-v0-4bit-128g
not sure why it happens, looks like a bug in transformers, as there is no special handling for no_cache anywhere, it's just passed as an argument to transformers
happens for me for all llama models if they get input embeddings, instead of token_ids
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.