Dongjie Shi
Dongjie Shi
Hi @nazneenn , we are developing a poc of FastAPI serving using multi-GPU, will keep you updated.
Hi @nazneenn @digitalscream FastAPI serving using multi-GPU is now supported in ipex-llm, please refer to this example https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Deepspeed-AutoTP-FastAPI
It's an known issue. User has successfully run IPEX-LLM vLLM in Docker.

we will verify the models in the list and plan the stable version release.
please try with latest Docker image: intelanalytics/ipex-llm-serving-xpu:2.2.0-SNAPSHOT
closed since no update for a long time