GenAIExamples
GenAIExamples copied to clipboard
Added the vLLM CPU supported for ChatQnA Application
Type of Change
Added the vLLM CPU service for an alternative model serving method in the ChatQnA application.
Description
This PR is adding the vLLM serving on the CPU as the LLM backend service in the ChatQnA application. As the vLLM only supports the CPU now, so we will add the vLLM CPU serving solution first. vLLM serving on the Gaudi will be added while its ready.
How has this PR been tested?
This PR is tested in the Gaudi2 server with:
- 2 sockers Intel(R) Xeon(R) Platinum 8368 CPU @ 2.40GHz
- 8 Gaudi nodes, HL-SMI Version: hl-1.14.0-fw-48.0.1.0 Driver Version: 1.14.0-9e8ecf8
which is tested well in the above env:
- vllm engine backend
- rag supported
- user query post successfully via vllm and rag support
Dependency Change?
vLLM and openai libs will be introduced into the ChatQnA application. Th vLLM CPU support issues have been resolved and merged in the vllm through this PR.
This PR is ready for merging. Please help review this PR. @Jian-Zhang @xuechendi
Please resolve the conflict.
Please resolve the conflict.
Have resolved the conflict. This PR is ready for merging. Please help review and merge it. Thanks. @chensuyue
Please help continue rebasing this PR. @XinyaoWa
Please help continue rebasing this PR. @XinyaoWa
Too much conflict, will close this PR, and send out another PR for merging vLLM later.