GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

Added the vLLM CPU supported for ChatQnA Application

Open tianyil1 opened this issue 1 year ago • 3 comments

Type of Change

Added the vLLM CPU service for an alternative model serving method in the ChatQnA application.

Description

This PR is adding the vLLM serving on the CPU as the LLM backend service in the ChatQnA application. As the vLLM only supports the CPU now, so we will add the vLLM CPU serving solution first. vLLM serving on the Gaudi will be added while its ready.

How has this PR been tested?

This PR is tested in the Gaudi2 server with:

  • 2 sockers Intel(R) Xeon(R) Platinum 8368 CPU @ 2.40GHz
  • 8 Gaudi nodes, HL-SMI Version: hl-1.14.0-fw-48.0.1.0 Driver Version: 1.14.0-9e8ecf8

which is tested well in the above env:

  • vllm engine backend image
  • rag supported image
  • user query post successfully via vllm and rag support image

Dependency Change?

vLLM and openai libs will be introduced into the ChatQnA application. Th vLLM CPU support issues have been resolved and merged in the vllm through this PR.

tianyil1 avatar Apr 16 '24 06:04 tianyil1

This PR is ready for merging. Please help review this PR. @Jian-Zhang @xuechendi

tianyil1 avatar Apr 23 '24 01:04 tianyil1

Please resolve the conflict.

chensuyue avatar May 06 '24 02:05 chensuyue

Please resolve the conflict.

Have resolved the conflict. This PR is ready for merging. Please help review and merge it. Thanks. @chensuyue

tianyil1 avatar May 06 '24 02:05 tianyil1

Please help continue rebasing this PR. @XinyaoWa

tianyil1 avatar Jun 13 '24 01:06 tianyil1

Please help continue rebasing this PR. @XinyaoWa

Too much conflict, will close this PR, and send out another PR for merging vLLM later.

XinyaoWa avatar Jun 13 '24 02:06 XinyaoWa