Dongjie Shi

Results 57 comments of Dongjie Shi

Hi @nazneenn , we are developing a poc of FastAPI serving using multi-GPU, will keep you updated.

Hi @nazneenn @digitalscream FastAPI serving using multi-GPU is now supported in ipex-llm, please refer to this example https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Deepspeed-AutoTP-FastAPI

It's an known issue. User has successfully run IPEX-LLM vLLM in Docker.

![908376fea6c33c461b83f15181e18b5](https://github.com/user-attachments/assets/a45bcde2-0126-4347-93e8-88b3aa068e6a)

we will verify the models in the list and plan the stable version release.

please try with latest Docker image: intelanalytics/ipex-llm-serving-xpu:2.2.0-SNAPSHOT

closed since no update for a long time