Jun Wang
Jun Wang
We have updated our openwebui+vllm-serving workflow [here](https://github.com/intel-analytics/ipex-llm/pull/12246/commits/5a931d141d4ceb1892f790236aeb0fc0810da596). Docker start scipt is [here](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#start-docker-container), please update the docker image before start it. Frontend and backend startup scripts are as follows, note change...
Could you offer the usage of this parameter. I have try it with ```bash python3 -m llama_cpp.server --model /home/LLM/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf --port 8002 --verbose True --n_gpu_layers 99 ---tensor_buft_overrides exp=CPU # and python3...
> 客户在 Xeon-W 一机4卡 Arc770 的环境下验证,ipex-llm 版本 2.1.0b2 > > 问题: > > 1. 用benchmark跑的时候已经趋于正常,但是直接调用的时候,有一定的概率没有输出,尤其是加了问号 ? 很大概率就没有输出了 > 调用样例: > time curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Llama-2-13b-chat-hf", "prompt":...
The native RPC of llama cpp is supported, but not released. We are developing a new RPC node for hybrid devices to enable one node with two backend type.
Cannot reproduce **Steps**: 1. start docker: ```bash #!/bin/bash export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1 export CONTAINER_NAME=junwang-vllm54-issue220 docker rm -f $CONTAINER_NAME sudo docker run -itd \ --net=host \ --device=/dev/dri \ --name=$CONTAINER_NAME \ -v /home/intel/LLM:/llm/models/ \...
**Have successfully verified on vllm0.5.4(docker image: `intelanalytics/ipex-llm-serving-xpu:latest`):** ### Test step run ` python vllm-out-verify.py /llm/models/deepseek-coder-7b-instruct-v1.5/ 1`, the vllm-out-verify.py is below: ```python from vllm import SamplingParams from ipex_llm.vllm.xpu.engine import IPEXLLMClass as...