text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Exception: LlamaRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids"

Open d8ahazard opened this issue 10 months ago • 5 comments

Describe the bug

Using

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Using "TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ" with the following startup params:

start_linux.sh --listen-port 10870 --model-dir /opt/rd/apps/Oobabooga/models --api --listen --bf16 --character Assistant --load-in-8bit --triton --use_flash_attention_2 --auto-devices --tensorcores --model TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ

As soon as I try talking with the model, it says it's typing, but I never get any response and the error shows in the log...

Screenshot

No response

Logs

19:43:21-447807 INFO     Loading settings from "settings.yaml"
19:43:21-470637 INFO     Loading "TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ"
19:43:21-604655 WARNING  Auto-assiging --gpu-memory 15 for your GPU to try to prevent out-of-memory errors. You can manually set other values.
19:43:21-608832 INFO     The AutoGPTQ params are: {'model_basename': 'wizardlm-13b-v1.1-superhot-8k-GPTQ-4bit-128g.no-act.order', 'device':
                         'cuda:0', 'use_triton': True, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True,
                         'trust_remote_code': False, 'max_memory': {0: '15GiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True,
                         'disable_exllama': False, 'disable_exllamav2': False}
/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
19:44:56-310994 INFO     LOADER: "AutoGPTQ"
19:44:56-312798 INFO     TRUNCATION LENGTH: 8192
19:44:56-313978 INFO     INSTRUCTION TEMPLATE: "Vicuna-v1.1"
19:44:56-315047 INFO     Loaded the model in 94.84 seconds.
19:44:56-316158 INFO     Loading the extension "openai"
19:44:56-453866 INFO     OpenAI-compatible API URL:

                         http://0.0.0.0:5000

19:44:56-454532 INFO     Loading the extension "gallery"

Running on local URL:  http://0.0.0.0:10870

Traceback (most recent call last):
  File "/opt/rd/apps/Oobabooga/modules/callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/modules/text_generation.py", line 397, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 447, in generate
    return self.model.generate(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
    outputs = self(
              ^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1019, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 740, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/auto_gptq/nn_modules/fused_llama_attn.py", line 72, in forward
    cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids'
Output generated in 7.78 seconds (0.00 tokens/s, 0 tokens, context 186, seed 1719846425)

System Info

Ubuntu22, 16GB VRAM, 24GB, Quadro RTX 5000

 NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4

d8ahazard avatar Mar 26 '24 19:03 d8ahazard

I receive the same error with the model: TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-4bit-32g-actorder_True

DennisVanDijk avatar Mar 27 '24 10:03 DennisVanDijk

Also, for what it's worth, I was able to use this model fine up until a few weeks ago. Guessing maybe the transformers version was bumped?

On Wed, Mar 27, 2024, 5:24 AM Dennis @.***> wrote:

I receive the same error with the model: TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-4bit-32g-actorder_True

— Reply to this email directly, view it on GitHub https://github.com/oobabooga/text-generation-webui/issues/5760#issuecomment-2022409640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMO4NFNKBUP7A67WTPDESTY2KF6PAVCNFSM6AAAAABFJS4NDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRSGQYDSNRUGA . You are receiving this because you authored the thread.Message ID: @.***>

d8ahazard avatar Mar 27 '24 12:03 d8ahazard

OK, so, the issue still persists, but I was able to get it working by switching the loader to ExLlamav2_HF. Unsure why loading from commandline selects the llamacpp loader, but manually specifying --loader ExLlamav2_HF gets the model to work agian.

d8ahazard avatar Mar 28 '24 18:03 d8ahazard

Same issue here with TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-8bit-32g-actorder_True using AutoGPTQ

kedom1337 avatar Mar 29 '24 19:03 kedom1337

I think this is an issue with AutoGPTQ, which uses outdated Torch requirements. It seems @oobabooga did fork the repo here to update it: https://github.com/oobabooga/AutoGPTQ But it doesn't seem to have been referenced yet and there aren't even releases yet. Would be interesting to know what's the priority here.

Matti-Koopa avatar Apr 04 '24 12:04 Matti-Koopa

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Jun 03 '24 23:06 github-actions[bot]

While encountering this issue trying to implement a paraphraser application (following https://www.kaggle.com/code/abdullahusmani86/llama-2-based-paraphraser-langchain/notebook) I was able to resolve this issue by downgrading version of transformers to 4.37.2 (as suggested here: https://github.com/IEIT-Yuan/Yuan-2.0/issues/123)

ssgmath avatar Jun 13 '24 07:06 ssgmath