Jun Wang comments

Results 16 comments of


                                            Jun Wang

Slow text generation on dual Arc A770's w/ vLLM

We have updated our openwebui+vllm-serving workflow [here](https://github.com/intel-analytics/ipex-llm/pull/12246/commits/5a931d141d4ceb1892f790236aeb0fc0810da596). Docker start scipt is [here](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#start-docker-container), please update the docker image before start it. Frontend and backend startup scripts are as follows, note change...

Added support for overriding tensor buffer types

Could you offer the usage of this parameter. I have try it with ```bash python3 -m llama_cpp.server --model /home/LLM/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf --port 8002 --verbose True --n_gpu_layers 99 ---tensor_buft_overrides exp=CPU # and python3...

Arc770 IPEX-LLM 的交互准确性问题

> 客户在 Xeon-W 一机4卡 Arc770 的环境下验证，ipex-llm 版本 2.1.0b2 > > 问题： > > 1. 用benchmark跑的时候已经趋于正常，但是直接调用的时候，有一定的概率没有输出，尤其是加了问号？很大概率就没有输出了 > 调用样例： > time curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Llama-2-13b-chat-hf", "prompt":...

enable rpc for multi nodes

The native RPC of llama cpp is supported, but not released. We are developing a new RPC node for hybrid devices to enable one node with two backend type.

Running vLLM service benchmark(1xARC770) with Qwen1.5-14B-Chat model failed(compression weight:SYM_INT4).

Cannot reproduce **Steps**: 1. start docker: ```bash #!/bin/bash export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1 export CONTAINER_NAME=junwang-vllm54-issue220 docker rm -f $CONTAINER_NAME sudo docker run -itd \ --net=host \ --device=/dev/dri \ --name=$CONTAINER_NAME \ -v /home/intel/LLM:/llm/models/ \...

Support DeepSeek-Coder-v1.5 7B

**Have successfully verified on vllm0.5.4(docker image: `intelanalytics/ipex-llm-serving-xpu:latest`):** ### Test step run ` python vllm-out-verify.py /llm/models/deepseek-coder-7b-instruct-v1.5/ 1`, the vllm-out-verify.py is below: ```python from vllm import SamplingParams from ipex_llm.vllm.xpu.engine import IPEXLLMClass as...