vllm
vllm copied to clipboard
[TPU] Enable gemma3-27b with TP>1 on multi-chips.
This PR enables gemma3-27b with TP>1 on multi-chips. Without the change, it fails with an error:
callstack:
Traceback (most recent call last):
File "/home/xiowei/vllm/vllm/v1/executor/multiproc_executor.py", line 465, in worker_busy_loop
output = func(*args, **kwargs)
File "/home/xiowei/vllm/vllm/v1/worker/tpu_worker.py", line 160, in determine_available_memory
self.model_runner.profile_run(self.model_runner.max_num_tokens)
File "/home/xiowei/vllm/vllm/v1/worker/tpu_model_runner.py", line 1166, in profile_run
dummy_encoder_outputs = self.model.get_multimodal_embeddings(
File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 588, in get_multimodal_embeddings
return self._process_image_input(image_input)
File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 569, in _process_image_input
image_features = self._image_pixels_to_features(
File "/home/xiowei/vllm/vllm/model_executor/models/gemma3_mm.py", line 557, in _image_pixels_to_features
image_features = vision_tower(pixel_values.to(dtype=target_dtype))
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 477, in forward
return self.vision_model(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 419, in forward
hidden_states = self.embeddings(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xiowei/vllm/vllm/model_executor/models/siglip.py", line 135, in forward
embeddings = embeddings + self.position_embedding(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xiowei/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py", line 406, in forward
masked_input, input_mask = get_masked_input_and_mask(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 671, in _fn
raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 768, in _compile_fx_inner
raise InductorError(e, currentframe()).with_traceback(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 753, in _compile_fx_inner
mb_compiled_graph = fx_codegen_and_compile(
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1357, in fx_codegen_and_compile
return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1246, in codegen_and_compile
compiled_module = graph.compile_to_module()
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2201, in compile_to_module
return self._compile_to_module()
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2209, in _compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2140, in codegen
self.init_wrapper_code()
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1898, in init_wrapper_code
self.device_ops = get_device_op_overrides(self.device_type)
File "/home/xiowei/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_inductor/codegen/common.py", line 490, in get_device_op_overrides
return device_op_overrides_dict[device]
torch._inductor.exc.InductorError: KeyError: 'xla'
Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
Test plan: pytest -s -vv tests/v1/tpu/test_basic.py -k test_gemma3_with_mm_on_multichip 2>&1 | tee ~/out.txt
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
cc: @bvrockwell @yarongmu-google
Somehow, I still couldn't see my TPU CI running (Is it because all the tests are run in sequence and a CI before the TPU CI gets stuck and blocks the TPU CI?) nor could I start the TPU CI myself (The "Run TPU V1 Tests" button is gray.)
The failing CI seems to have the symptom of timeout. I don't see why my PR would cause that.
I retried the failing tests, but I think we can merge ignoring those timeouts
Thanks @mgoin . I also did some check on my a100 VM. For the 2 failing tests:
- VLLM_USE_V1=1 pytest -s -vv tests/mq_llm_engine/test_error_handling.py::test_mp_crash_detection: it fails on the main branch (4c33d6732148fdaeb9780fa86fca1f87f2a93c19)
- VLLM_USE_V1=1 pytest -s -vv tests/v1/engine/test_engine_core_client.py -k test_startup_failure: it succeeds on my branch xiowei/gemma3-27b-multi-chip
Could you help merge the PR? Thanks!
Nice improvement and TPU V1 test is green!