Canlin Guo

Results 56 comments of Canlin Guo

## Limitation - We need at least 4 cards(910B) on NPU instead of 2 cards(A100, H100) on GPU to avoid OOM. ## Known Issues ### Overview - Same as Qwen2.5-Omni,...

Now qwen3-omni can run on NPU with this PR. Let me fix the batch issue for Qwen2.5-Omni and Qwen3-Omni later.

> sorry for the misleading docker file installation: https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/npu/#recommended > > I think you uv pip install vllm-omni directly for v0.11.0rc1 rather than build from source > > [@gcanlin](https://github.com/gcanlin) could...

Please checkout commit `9464e14` and use vllm-ascend v0.11.0rc2 and vllm v0.11.0. We're still upgrading vllm-ascend to v0.12.0rc1.

> I'm use vllm-ascend v0.11.0rc1 and this error as follows: > > INFO 12-23 09:07:54 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available. WARNING...

Request tasks: - Qwen2.5-Omni offline test have been working and almost been done in #168. - API - OpenAI API for image generation

cc @tzhouam @R2-Y @hsliuustc0106 PTAL and add a ready tag to test all models. Thanks!

> could you please post the test result before and after this commit? Of course. Update now.

@hsliuustc0106 CI breaks because of `Gateway Timeout`. How can I fix it? Or could you please help retry it? ``` 2025-12-24T14:40:52Z] Installing collected packages: pip -- [2025-12-24T14:40:53Z] Successfully installed pip-25.3...