MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B
when I run vLLM model like Qwen2-VL-2B with ARC770 on MTL platform, will report error message as below: RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 6.10 GiB (GPU 0; 15.11 GiB total capacity; 4.84 GiB already allocated; 5.41 GiB reserved in total by PyTorch)
Vllm 0.5.4 does not support qwen2-vl model yet. We will support it in the future 0.6.1 version.
Thank you! But I need double confirm, I use ipex to run Qwen2-VL-2B, not OpenVINO, vLLM 0.5.4 not support, right?
Yes, even the official version of vllm 0.5.4 does not support it until 0.6.1.
Thanks again. One more question, is any vLLM model available that I can use with vllm 0.5.4? Can you advise me one or two that I can try it. Thanks.
It is recommended to run Llama Qwen and chatglm models.
for example: Llama-2-7b-chat-hf Qwen1.5-7B-Chat chatglm3-6b.
Hi, Could you kindly point out what schedule of 0.6.1 for release if you know it, thanks.
We are validating 0.6.2 version, and Qwen2-VL model, will notify you once it's ready. Thanks.