Dongjie Shi comments

Results 57 comments of


                                            Dongjie Shi

provide support for model serving using FastAPI deepspeed+ipex-llm

Hi @nazneenn , we are developing a poc of FastAPI serving using multi-GPU, will keep you updated.

provide support for model serving using FastAPI deepspeed+ipex-llm

Hi @nazneenn @digitalscream FastAPI serving using multi-GPU is now supported in ipex-llm, please refer to this example https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Deepspeed-AutoTP-FastAPI

vllm can‘t use oneCCL on host

It's an known issue. User has successfully run IPEX-LLM vLLM in Docker.

support inference AWQ INT4 model of Yi-34B from QLoRA

fixed

Running vLLM service benchmark(4xARC770) with Qwen1.5-32B-Chat model failed

fixed

failure to launch codegeex4-all-9b Using vllm

fixed

win11运行ipex报错：AMX state allocation in the OS failed

![908376fea6c33c461b83f15181e18b5](https://github.com/user-attachments/assets/a45bcde2-0126-4347-93e8-88b3aa068e6a)

release stable version for inference LLM with 2x or 4x Arc A770

we will verify the models in the list and plan the stable version release.

Arc770 IPEX-LLM 的交互准确性问题

please try with latest Docker image: intelanalytics/ipex-llm-serving-xpu:2.2.0-SNAPSHOT

Arc770 IPEX-LLM 的交互准确性问题

closed since no update for a long time