vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Bug]: The size of tensor a (49472) must match the size of tensor b (49664) at non-singleton dimension 1

Open aldettinger opened this issue 7 months ago • 7 comments

Your current environment

I'm trying to use a lora adapter along its base model on my CPUs only machine from a docker container.

I first build a local cpu based image, like below:

git checkout v0.8.5
docker build -f docker/Dockerfile.cpu -t vllm-0.8.5-cpu-env --shm-size=4g .

From there, I'm able to run the model and associated lora adapter with success:

docker run -it --mount type=bind,source=/home/agallice/dev/hugging-face-models/,target=/hf-models --rm --network=host vllm-0.8.4-cpu-env --model /hf-models/granite-3.2-8b-instruct --max-model-len 16384 --enable-lora --lora-modules uncertainty-lora=/hf-models/granite-uncertainty-3.2-8b-lora

However, this all break on the first request, even simply requesting the base model ala below:

[main_upstream @ uncertainty-quarkus-experiments]$     curl -X POST "http://localhost:8000/v1/chat/completions" \
    -H "Content-Type: application/json" \
    --data '{
        "model": "/hf-models/granite-3.2-8b-instruct",
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }'

The served engine crashes with below stack trace:

I've tried to experiment with different setups, command line arguments and so on. To no avail at this stage. The issue seems to lie on lora_ops.py. I suspect this is a bug in vllm.

Any idea/guidance about this ? I wonder if this is a cpu only kind of issue ?

🐛 Describe the bug

The server stack trace on crash is below:

ERROR 04-29 14:55:14 [engine.py:160] RuntimeError('The size of tensor a (49472) must match the size of tensor b (49664) at non-singleton dimension 1')
ERROR 04-29 14:55:14 [engine.py:160] Traceback (most recent call last):
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 04-29 14:55:14 [engine.py:160]     self.run_engine_loop()
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 04-29 14:55:14 [engine.py:160]     request_outputs = self.engine_step()
ERROR 04-29 14:55:14 [engine.py:160]                       ^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 04-29 14:55:14 [engine.py:160]     raise e
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 04-29 14:55:14 [engine.py:160]     return self.engine.step()
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 1431, in step
ERROR 04-29 14:55:14 [engine.py:160]     outputs = self.model_executor.execute_model(
ERROR 04-29 14:55:14 [engine.py:160]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 299, in execute_model
ERROR 04-29 14:55:14 [engine.py:160]     driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 04-29 14:55:14 [engine.py:160]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 144, in _driver_execute_model
ERROR 04-29 14:55:14 [engine.py:160]     return self.driver_worker.execute_model(execute_model_req)
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 420, in execute_model
ERROR 04-29 14:55:14 [engine.py:160]     output = self.model_runner.execute_model(
ERROR 04-29 14:55:14 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-29 14:55:14 [engine.py:160]     return func(*args, **kwargs)
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/worker/cpu_model_runner.py", line 664, in execute_model
ERROR 04-29 14:55:14 [engine.py:160]     logits = self.model.compute_logits(hidden_states,
ERROR 04-29 14:55:14 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/granite.py", line 463, in compute_logits
ERROR 04-29 14:55:14 [engine.py:160]     logits = self.logits_processor(self.lm_head, hidden_states,
ERROR 04-29 14:55:14 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-29 14:55:14 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-29 14:55:14 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/layers.py", line 1164, in forward
ERROR 04-29 14:55:14 [engine.py:160]     return type(self.base_layer).forward(self, *args, **kwargs)
ERROR 04-29 14:55:14 [engine.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/logits_processor.py", line 70, in forward
ERROR 04-29 14:55:14 [engine.py:160]     logits = self._get_logits(hidden_states, lm_head, embedding_bias)
ERROR 04-29 14:55:14 [engine.py:160]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/layers.py", line 1155, in _get_logits
ERROR 04-29 14:55:14 [engine.py:160]     self.punica_wrapper.add_lora_logits(logits, hidden_states,
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/punica_wrapper/punica_cpu.py", line 343, in add_lora_logits
ERROR 04-29 14:55:14 [engine.py:160]     bgmv_expand(buffer,
ERROR 04-29 14:55:14 [engine.py:160]   File "/opt/venv/lib/python3.12/site-packages/vllm/lora/ops/torch_ops/lora_ops.py", line 40, in bgmv_expand
ERROR 04-29 14:55:14 [engine.py:160]     output_tensor[:, :outputs.shape[1]] += outputs[:limit, :]
ERROR 04-29 14:55:14 [engine.py:160] RuntimeError: The size of tensor a (49472) must match the size of tensor b (49664) at non-singleton dimension 1

Before submitting a new issue...

  • [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

aldettinger avatar Apr 29 '25 15:04 aldettinger

Looks like a LoRA problem

DarkLight1337 avatar Apr 29 '25 16:04 DarkLight1337

are you using this lora model

jeejeelee avatar Apr 30 '25 03:04 jeejeelee

@jeejeelee Yes, this is the lora model I try to use along with this base model.

I've downloaded both locally.

aldettinger avatar Apr 30 '25 06:04 aldettinger

Hello @jeejeelee, I wonder if the issue could be reproduced on another machine than mine ?

aldettinger avatar May 09 '25 08:05 aldettinger

I can reproduce this issue and am working on a fix.

frreiss avatar May 21 '25 20:05 frreiss

Many thanks for stepping in @frreiss 👍

On my side, I was able to merge the lora + base model on another platform. From there, I'm now able to run the merged model on my CPU only machine with vllm in a docker image.

Please let me know in case you need a beta tester for the investigation or fix.

aldettinger avatar May 26 '25 07:05 aldettinger

@aldettinger Can you test if #18773 fix your issue?

jeejeelee avatar May 28 '25 01:05 jeejeelee

@jeejeelee Sure, looks ok.

aldettinger avatar May 28 '25 14:05 aldettinger