text-generation-webui
text-generation-webui copied to clipboard
Exception: LlamaRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids"
Describe the bug
Using
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Using "TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ" with the following startup params:
start_linux.sh --listen-port 10870 --model-dir /opt/rd/apps/Oobabooga/models --api --listen --bf16 --character Assistant --load-in-8bit --triton --use_flash_attention_2 --auto-devices --tensorcores --model TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ
As soon as I try talking with the model, it says it's typing, but I never get any response and the error shows in the log...
Screenshot
No response
Logs
19:43:21-447807 INFO Loading settings from "settings.yaml"
19:43:21-470637 INFO Loading "TheBloke_WizardLM-13B-V1-1-SuperHOT-8K-GPTQ"
19:43:21-604655 WARNING Auto-assiging --gpu-memory 15 for your GPU to try to prevent out-of-memory errors. You can manually set other values.
19:43:21-608832 INFO The AutoGPTQ params are: {'model_basename': 'wizardlm-13b-v1.1-superhot-8k-GPTQ-4bit-128g.no-act.order', 'device':
'cuda:0', 'use_triton': True, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True,
'trust_remote_code': False, 'max_memory': {0: '15GiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True,
'disable_exllama': False, 'disable_exllamav2': False}
/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
19:44:56-310994 INFO LOADER: "AutoGPTQ"
19:44:56-312798 INFO TRUNCATION LENGTH: 8192
19:44:56-313978 INFO INSTRUCTION TEMPLATE: "Vicuna-v1.1"
19:44:56-315047 INFO Loaded the model in 94.84 seconds.
19:44:56-316158 INFO Loading the extension "openai"
19:44:56-453866 INFO OpenAI-compatible API URL:
http://0.0.0.0:5000
19:44:56-454532 INFO Loading the extension "gallery"
Running on local URL: http://0.0.0.0:10870
Traceback (most recent call last):
File "/opt/rd/apps/Oobabooga/modules/callbacks.py", line 61, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/modules/text_generation.py", line 397, in generate_with_callback
shared.model.generate(**kwargs)
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 447, in generate
return self.model.generate(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 2696, in sample
outputs = self(
^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
outputs = self.model(
^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1019, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 740, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/auto_gptq/nn_modules/fused_llama_attn.py", line 72, in forward
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rd/apps/Oobabooga/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids'
Output generated in 7.78 seconds (0.00 tokens/s, 0 tokens, context 186, seed 1719846425)
System Info
Ubuntu22, 16GB VRAM, 24GB, Quadro RTX 5000
NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4
I receive the same error with the model: TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-4bit-32g-actorder_True
Also, for what it's worth, I was able to use this model fine up until a few weeks ago. Guessing maybe the transformers version was bumped?
On Wed, Mar 27, 2024, 5:24 AM Dennis @.***> wrote:
I receive the same error with the model: TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-4bit-32g-actorder_True
— Reply to this email directly, view it on GitHub https://github.com/oobabooga/text-generation-webui/issues/5760#issuecomment-2022409640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMO4NFNKBUP7A67WTPDESTY2KF6PAVCNFSM6AAAAABFJS4NDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRSGQYDSNRUGA . You are receiving this because you authored the thread.Message ID: @.***>
OK, so, the issue still persists, but I was able to get it working by switching the loader to ExLlamav2_HF. Unsure why loading from commandline selects the llamacpp loader, but manually specifying --loader ExLlamav2_HF gets the model to work agian.
Same issue here with TheBloke_deepseek-coder-6.7B-instruct-GPTQ_gptq-8bit-32g-actorder_True
using AutoGPTQ
I think this is an issue with AutoGPTQ, which uses outdated Torch requirements. It seems @oobabooga did fork the repo here to update it: https://github.com/oobabooga/AutoGPTQ But it doesn't seem to have been referenced yet and there aren't even releases yet. Would be interesting to know what's the priority here.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
While encountering this issue trying to implement a paraphraser application (following https://www.kaggle.com/code/abdullahusmani86/llama-2-based-paraphraser-langchain/notebook) I was able to resolve this issue by downgrading version of transformers to 4.37.2 (as suggested here: https://github.com/IEIT-Yuan/Yuan-2.0/issues/123)