Wang, Jian4 comments

Results 44 comments of


                                            Wang, Jian4

Qwen3-32B enable prefix caching error on tool call

This issue is caused by prefix-caching code error, and it‘s fixed by this [pr](https://github.com/analytics-zoo/vLLM-ARC-X/pull/17/files).

Qwen2.5-VL-3B-Instruct can not infer a picture

You can use image `intelanalytics/ipex-llm-serving-xpu 0.8.3-b22` to test again. This problem is not encountered on b22 because the sdpa method update on qwen2.5-vl.

DeepSeek-R1-Distill-Llama-70B asym_int4 result error

There are indeed some problems with `asym_int4` on running Llama-70B, why don't you use `sym_int4` or `woq_int4`?

Qwen 2.5-vl support？

@buffliu You can add `--allowed-local-media-path /llm/models/media` on starting vllm service, and then you can send a vedio like: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen2.5-VL-7B-Instruct",...

Qwen2.5-VL-32B-Instruct fp16 image recognition issue

Will be fixed by this [pr](https://github.com/intel/ipex-llm/pull/13178)

MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B

Vllm 0.5.4 does not support qwen2-vl model yet. We will support it in the future 0.6.1 version.

MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B

Yes, even the official version of vllm 0.5.4 does not support it until 0.6.1.

MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B

It is recommended to run Llama Qwen and chatglm models. for example: `Llama-2-7b-chat-hf Qwen1.5-7B-Chat chatglm3-6b`.

multi-modal model chunked-prefill report error on ARC

The b21 image actually exits bug on chunked-prefill and it will be fixed in next version. But it seems that the multimodal model can't not use chunked-prefill on v0 engine...

Running vLLM service benchmark(1xARC770) with Qwen1.5-14B-Chat model failed(compression weight:SYM_INT4).

I didn't reproduce this error. Did you encounter this issue when you started vllm? Or when you benchmarked?