text-generation-webui Crash with llava extension + --no-cache

Describe the bug

I'm trying to use the llava extension with my 12 GB RTX3060 card. It's working reasonably well, and I notice the VRAM usage idles at about 9.1 GB. When composing responses it quickly balloons to the max, and on some prompts aborts with an out of memory error.

I tried --no-cache to work around this, and it just crashes when starting to work on the response.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Include - -extensions llava --no-cache, load a picture and enter a prompt.

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/root/text-generation-webui/modules/callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/root/text-generation-webui/modules/text_generation.py", line 290, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 527, in forward
    position_ids = position_ids.view(-1, seq_length).long()
RuntimeError: shape '[-1, 380]' is invalid for input of size 381
Output generated in 4.74 seconds (0.21 tokens/s, 1 tokens, context 380, seed 608416594)

System Info

Ubuntu 22.04
RTX 3060

Apr 28 '23 19:04 dblacknc

this model: wojtab_llava-13b-v0-4bit-128g

Apr 28 '23 19:04 dblacknc

not sure why it happens, looks like a bug in transformers, as there is no special handling for no_cache anywhere, it's just passed as an argument to transformers

Apr 29 '23 23:04 Wojtab

happens for me for all llama models if they get input embeddings, instead of token_ids

Apr 29 '23 23:04 Wojtab

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Jun 01 '23 23:06 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

Crash with llava extension + --no-cache

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard