Canlin Guo
Canlin Guo
## Limitation - We need at least 4 cards(910B) on NPU instead of 2 cards(A100, H100) on GPU to avoid OOM. ## Known Issues ### Overview - Same as Qwen2.5-Omni,...
Now qwen3-omni can run on NPU with this PR. Let me fix the batch issue for Qwen2.5-Omni and Qwen3-Omni later.
> sorry for the misleading docker file installation: https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/npu/#recommended > > I think you uv pip install vllm-omni directly for v0.11.0rc1 rather than build from source > > [@gcanlin](https://github.com/gcanlin) could...
https://github.com/vllm-project/vllm-omni/pull/434 is fixing.
Please checkout commit `9464e14` and use vllm-ascend v0.11.0rc2 and vllm v0.11.0. We're still upgrading vllm-ascend to v0.12.0rc1.
> I'm use vllm-ascend v0.11.0rc1 and this error as follows: > > INFO 12-23 09:07:54 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING...
Request tasks: - Qwen2.5-Omni offline test have been working and almost been done in #168. - API - OpenAI API for image generation
cc @tzhouam @R2-Y @hsliuustc0106 PTAL and add a ready tag to test all models. Thanks!
> could you please post the test result before and after this commit? Of course. Update now.
@hsliuustc0106 CI breaks because of `Gateway Timeout`. How can I fix it? Or could you please help retry it? ``` 2025-12-24T14:40:52Z] Installing collected packages: pip -- [2025-12-24T14:40:53Z] Successfully installed pip-25.3...