vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2

Open choco9966 opened this issue 1 year ago • 16 comments

Your current environment

Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1
[conda] flashinfer                0.0.9+cu121torch2.3          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] sentence-transformers     3.0.1                    pypi_0    pypi
[conda] torch                     2.3.1                    pypi_0    pypi
[conda] torchvision               0.18.1                   pypi_0    pypi
[conda] transformers              4.42.4                   pypi_0    pypi
[conda] triton                    2.3.1                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.2

🐛 Describe the bug

I encountered the following error when running Gemma-2-9b. Even after deleting and reinstalling the virtual environment, the same error repeats.

INFO 07-17 00:14:06 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:07 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:10 model_runner.py:266] Loading model weights took 17.3781 GB
ERROR 07-17 00:14:10 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-17 00:14:10 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-17 00:14:10 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]:     llm = LLM(model=args.model_path, 
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 336, in forward
[rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 277, in forward
[rank0]:     hidden_states, residual = layer(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 221, in forward
[rank0]:     hidden_states = self.self_attn(
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 161, in forward
[rank0]:     q, k = self.rotary_emb(positions, q, k)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]:     return self._forward_method(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]:     ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]:     raise e
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]:     torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]:   File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'

choco9966 avatar Jul 16 '24 15:07 choco9966

Running into the same issue with vllm 0.5.2, torch 2.3.1 and flashinfer https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.9/flashinfer-0.0.9+cu121torch2.3-cp311-cp311-linux_x86_64.whl

dsingal0 avatar Jul 17 '24 04:07 dsingal0

Ran into the same issue with T4 GPU and vllm==0.5.2, model==google/gemma-2b. Infact this is not just with gemma, but I see this with every supported model of vLLM now. vllm_error

pavanjava avatar Jul 17 '24 08:07 pavanjava

What OSes are you all on?

tlrmchlsmth avatar Jul 17 '24 17:07 tlrmchlsmth

Also @choco9966 is there more output that you could share? Ideally copy and paste everything

tlrmchlsmth avatar Jul 17 '24 17:07 tlrmchlsmth

Linux

thegallier avatar Jul 18 '24 08:07 thegallier

Hey, having the same issue:

from vllm import LLM
LLM("vwxyzjn/rloo_tldr")
WARNING 07-18 15:12:42 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_C.abi3.so)")
INFO 07-18 15:13:05 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='vwxyzjn/rloo_tldr', speculative_config=None, tokenizer='vwxyzjn/rloo_tldr', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=vwxyzjn/rloo_tldr, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 07-18 15:13:05 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-18 15:13:05 weight_utils.py:261] No model.safetensors.index.json found in remote.
INFO 07-18 15:13:06 model_runner.py:266] Loading model weights took 1.8848 GB
ERROR 07-18 15:13:06 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-18 15:13:06 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-18 15:13:06 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]:   File "<stdin>", line 1, in <module>
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
[rank0]:     hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
[rank0]:     hidden_states = layer(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
[rank0]:     attn_output = self.attention(
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
[rank0]:     q, k = self.rotary_emb(position_ids, q, k)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]:     return self._forward_method(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]:     ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]:     raise e
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]:     torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]:   File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]:     raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'

System info

  • OS: Ubuntu 20.04.6 LTS
  • Python 3.10.14
  • vllm 0.5.2
  • vllm-flash-attn 2.5.9.post1

qgallouedec avatar Jul 18 '24 15:07 qgallouedec

Downgrading to 0.5.1 solved the issue

pip install vllm==0.5.1

qgallouedec avatar Jul 18 '24 15:07 qgallouedec

Same other versions? I think I tried that version but maybe other versions.

thegallier avatar Jul 18 '24 15:07 thegallier

@qgallouedec that's the same problem as https://github.com/vllm-project/vllm/issues/6462. I think people are generally having glibc versioning issues with 0.5.2.

Working on it here: https://github.com/vllm-project/vllm/pull/6517

tlrmchlsmth avatar Jul 18 '24 15:07 tlrmchlsmth

I have this same problem too.

Traceback (most recent call last):
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
    hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
    hidden_states = layer(
                    ^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
    attn_output = self.attention(
                  ^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
    q, k = self.rotary_emb(position_ids, q, k)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
    return self._forward_method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
    ops.rotary_embedding(positions, query, key, self.head_size,
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 43, in wrapper
    raise e
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 34, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
    torch.ops._C.rotary_embedding(positions, query, key, head_size,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/_ops.py", line 921, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'

RylanSchaeffer avatar Jul 19 '24 18:07 RylanSchaeffer

I moved the vllm (inside vllm) subdirectory. Then installed 0.4.2 and that solved my issues. Rms and rotary.

thegallier avatar Jul 19 '24 19:07 thegallier

having the same issue with 0.5.2... do we plan to fix it? probably due to the requirements of torch 2.3.1

yuchenlin avatar Jul 19 '24 23:07 yuchenlin

Assuming that most people are having glibc versioning problems here, this issue should be fixed for most people in 0.5.3 and later, now that we are building on Ubuntu 20.04. I think we can go ahead and close this one.

tlrmchlsmth avatar Jul 23 '24 21:07 tlrmchlsmth

For me this issue persists on Ubuntu 22.04 with vllm 0.6.0 (also with v0.4.2).

mahenning avatar Sep 10 '24 11:09 mahenning

Hi I'm having the same issue on Ubuntu 22.04 with the latest release (0.6.4). It causes tests/lora/test_layers.py::test_rotary_embedding_long_context to fail.

Akshat-Tripathi avatar Nov 19 '24 13:11 Akshat-Tripathi

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Feb 18 '25 01:02 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions[bot] avatar Mar 20 '25 02:03 github-actions[bot]