GenAIExamples Added the vLLM CPU supported for ChatQnA Application

Added the vLLM CPU supported for ChatQnA Application

Open tianyil1 opened this issue 1 year ago • 3 comments

Type of Change

Added the vLLM CPU service for an alternative model serving method in the ChatQnA application.

Description

This PR is adding the vLLM serving on the CPU as the LLM backend service in the ChatQnA application. As the vLLM only supports the CPU now, so we will add the vLLM CPU serving solution first. vLLM serving on the Gaudi will be added while its ready.

How has this PR been tested?

This PR is tested in the Gaudi2 server with:

2 sockers Intel(R) Xeon(R) Platinum 8368 CPU @ 2.40GHz
8 Gaudi nodes, HL-SMI Version: hl-1.14.0-fw-48.0.1.0 Driver Version: 1.14.0-9e8ecf8

which is tested well in the above env:

vllm engine backend
rag supported
user query post successfully via vllm and rag support

Dependency Change?

vLLM and openai libs will be introduced into the ChatQnA application. Th vLLM CPU support issues have been resolved and merged in the vllm through this PR.

Apr 16 '24 06:04 tianyil1

This PR is ready for merging. Please help review this PR. @Jian-Zhang @xuechendi

Apr 23 '24 01:04 tianyil1

Please resolve the conflict.

May 06 '24 02:05 chensuyue

Please resolve the conflict.

Have resolved the conflict. This PR is ready for merging. Please help review and merge it. Thanks. @chensuyue

May 06 '24 02:05 tianyil1

Please help continue rebasing this PR. @XinyaoWa

Jun 13 '24 01:06 tianyil1

Please help continue rebasing this PR. @XinyaoWa

Too much conflict, will close this PR, and send out another PR for merging vLLM later.

Jun 13 '24 02:06 XinyaoWa

GenAIExamples GenAIExamples copied to clipboard

Added the vLLM CPU supported for ChatQnA Application

Type of Change

Description

How has this PR been tested?

Dependency Change?

GenAIExamples
GenAIExamples copied to clipboard