vllm
vllm copied to clipboard
[Bug]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding' / gemma-2-9b with vllm=0.5.2
Your current environment
Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1
[conda] flashinfer 0.0.9+cu121torch2.3 pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] sentence-transformers 3.0.1 pypi_0 pypi
[conda] torch 2.3.1 pypi_0 pypi
[conda] torchvision 0.18.1 pypi_0 pypi
[conda] transformers 4.42.4 pypi_0 pypi
[conda] triton 2.3.1 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.2
🐛 Describe the bug
I encountered the following error when running Gemma-2-9b. Even after deleting and reinstalling the virtual environment, the same error repeats.
INFO 07-17 00:14:06 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:07 selector.py:79] Using Flashinfer backend.
INFO 07-17 00:14:10 model_runner.py:266] Loading model weights took 17.3781 GB
ERROR 07-17 00:14:10 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-17 00:14:10 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-17 00:14:10 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]: llm = LLM(model=args.model_path,
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 336, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 277, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 221, in forward
[rank0]: hidden_states = self.self_attn(
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/models/gemma2.py", line 161, in forward
[rank0]: q, k = self.rotary_emb(positions, q, k)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]: return self._forward_method(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]: ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]: raise e
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]: torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]: File "/home/choco_9966/miniconda3/envs/gemma/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
Running into the same issue with vllm 0.5.2, torch 2.3.1 and flashinfer https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.9/flashinfer-0.0.9+cu121torch2.3-cp311-cp311-linux_x86_64.whl
Ran into the same issue with T4 GPU and vllm==0.5.2, model==google/gemma-2b. Infact this is not just with gemma, but I see this with every supported model of vLLM now.
What OSes are you all on?
Also @choco9966 is there more output that you could share? Ideally copy and paste everything
Linux
Hey, having the same issue:
from vllm import LLM
LLM("vwxyzjn/rloo_tldr")
WARNING 07-18 15:12:42 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_C.abi3.so)")
INFO 07-18 15:13:05 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='vwxyzjn/rloo_tldr', speculative_config=None, tokenizer='vwxyzjn/rloo_tldr', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=vwxyzjn/rloo_tldr, use_v2_block_manager=False, enable_prefix_caching=False)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 07-18 15:13:05 weight_utils.py:218] Using model weights format ['*.safetensors']
INFO 07-18 15:13:05 weight_utils.py:261] No model.safetensors.index.json found in remote.
INFO 07-18 15:13:06 model_runner.py:266] Loading model weights took 1.8848 GB
ERROR 07-18 15:13:06 _custom_ops.py:42] Error in calling custom op rotary_embedding: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
ERROR 07-18 15:13:06 _custom_ops.py:42] Possibly you have built or installed an obsolete version of vllm.
ERROR 07-18 15:13:06 _custom_ops.py:42] Please try a clean build and install of vllm,or remove old built files such as vllm/*cpython*.so and build/ .
[rank0]: Traceback (most recent call last):
[rank0]: File "<stdin>", line 1, in <module>
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 150, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 421, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 362, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 78, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
[rank0]: hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
[rank0]: hidden_states = layer(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
[rank0]: attn_output = self.attention(
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
[rank0]: q, k = self.rotary_emb(position_ids, q, k)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
[rank0]: return self._forward_method(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
[rank0]: ops.rotary_embedding(positions, query, key, self.head_size,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 43, in wrapper
[rank0]: raise e
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 34, in wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
[rank0]: torch.ops._C.rotary_embedding(positions, query, key, head_size,
[rank0]: File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.10/site-packages/torch/_ops.py", line 921, in __getattr__
[rank0]: raise AttributeError(
[rank0]: AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
System info
- OS: Ubuntu 20.04.6 LTS
- Python 3.10.14
- vllm 0.5.2
- vllm-flash-attn 2.5.9.post1
Downgrading to 0.5.1 solved the issue
pip install vllm==0.5.1
Same other versions? I think I tried that version but maybe other versions.
@qgallouedec that's the same problem as https://github.com/vllm-project/vllm/issues/6462. I think people are generally having glibc versioning issues with 0.5.2.
Working on it here: https://github.com/vllm-project/vllm/pull/6517
I have this same problem too.
Traceback (most recent call last):
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/worker.py", line 179, in determine_num_available_blocks
self.model_runner.profile_run()
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 923, in profile_run
self.execute_model(model_input, kv_caches, intermediate_tensors)
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1341, in execute_model
hidden_or_intermediate_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 257, in forward
hidden_states = self.gpt_neox(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 219, in forward
hidden_states = layer(
^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 163, in forward
attn_output = self.attention(
^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/models/gpt_neox.py", line 104, in forward
q, k = self.rotary_emb(position_ids, q, k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/custom_op.py", line 13, in forward
return self._forward_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 220, in forward_cuda
ops.rotary_embedding(positions, query, key, self.head_size,
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 43, in wrapper
raise e
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 34, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/vllm/_custom_ops.py", line 141, in rotary_embedding
torch.ops._C.rotary_embedding(positions, query, key, head_size,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lfs/skampere1/0/rschaef/miniconda3/envs/reward_modeling_20240708/lib/python3.11/site-packages/torch/_ops.py", line 921, in __getattr__
raise AttributeError(
AttributeError: '_OpNamespace' '_C' object has no attribute 'rotary_embedding'
I moved the vllm (inside vllm) subdirectory. Then installed 0.4.2 and that solved my issues. Rms and rotary.
having the same issue with 0.5.2... do we plan to fix it? probably due to the requirements of torch 2.3.1
Assuming that most people are having glibc versioning problems here, this issue should be fixed for most people in 0.5.3 and later, now that we are building on Ubuntu 20.04. I think we can go ahead and close this one.
For me this issue persists on Ubuntu 22.04 with vllm 0.6.0 (also with v0.4.2).
Hi I'm having the same issue on Ubuntu 22.04 with the latest release (0.6.4). It causes tests/lora/test_layers.py::test_rotary_embedding_long_context to fail.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!