Dongjie Shi
Dongjie Shi
We refactored the vLLM code today, the new code and image are in the CI/CD progress, while the doc is updated first. Please try `python -m ipex_llm.vllm.entrypoints.openai.api_server` for today's ipex-llm...
Could you please share the client script/code to send requests to the vLLM API server, so we can use it to get the inference/text generation TPS same as you.
> Is that testing the inference performance or the text generation performance? I can't seem to get the vllm_online_benchmark working, just keeps throwing 404 errors on the server as far...
Hi @HumerousGorgon, what's the difference between inference and text generation you mentioned here? Can you observe data as below? Or are there any other metrics that help you differentiate between...
please also make sure "re-sizeable BAR support" and "above 4G mmio" are enabled.
> Is there a schedule when we could get a vllm supported version to run MiniCPM-v ? hi, the upgrade of IPEX-LLM vLLM to 0.5.x is in progress, we will...
the upgrade of IPEX-LLM vLLM to 0.5.4 is finished, and MiniCPM-V-2.6 is supported, please refer to https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving#image-input
Sorry, we don't have any plan to support Confyui and flux yet. BTW, phi-3 is supported, please refer to https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/phi-3-vision
Sorry, we don't have any plan to support Confyui and flux yet.