ipex-llm MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B

when I run vLLM model like Qwen2-VL-2B with ARC770 on MTL platform, will report error message as below: RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 6.10 GiB (GPU 0; 15.11 GiB total capacity; 4.84 GiB already allocated; 5.41 GiB reserved in total by PyTorch)

Uploading Screenshot from 2024-09-27 15-45-34.png…

Sep 27 '24 07:09 weijiejx

Vllm 0.5.4 does not support qwen2-vl model yet. We will support it in the future 0.6.1 version.

Sep 29 '24 02:09 hzjane

Thank you! But I need double confirm, I use ipex to run Qwen2-VL-2B, not OpenVINO, vLLM 0.5.4 not support, right?

Sep 29 '24 02:09 weijiejx

Yes, even the official version of vllm 0.5.4 does not support it until 0.6.1.

Sep 29 '24 02:09 hzjane

Thanks again. One more question, is any vLLM model available that I can use with vllm 0.5.4? Can you advise me one or two that I can try it. Thanks.

Sep 29 '24 07:09 weijiejx

It is recommended to run Llama Qwen and chatglm models. for example: Llama-2-7b-chat-hf Qwen1.5-7B-Chat chatglm3-6b.

Sep 29 '24 08:09 hzjane

Hi， Could you kindly point out what schedule of 0.6.1 for release if you know it, thanks.

Oct 10 '24 02:10 weijiejx

We are validating 0.6.2 version, and Qwen2-VL model, will notify you once it's ready. Thanks.

Oct 10 '24 05:10 glorysdj