GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

add Gaudi support for TEI Embedding service in ChatQnA to reduce latency on > 16 concurrent user requests.

Open louie-tsai opened this issue 7 months ago • 4 comments
trafficstars

Description

Benchmarking on Gaudi3 using GenAIEval with output token : 128. blue line is for embedding on CPU, and orange line is for embedding on Gaudi. once the number of user requests goes up to 16 at the same time, embedding on Gaudi helps to reduce overall ChatQnA latency. image

Therefore, we also enable Gaudi support for embedding on ChatQnA. By default, embedding is still on CPU. Adding an additional compose.tei-embedding-gaudi.yaml during docker compose up will make embedding run on Gaudi instead. docker compose -f compose.yaml -f compose.tei-embedding-gaudi.yaml up -d

also rename the embedding docker name on CPU from tei-embedding-gaudi-server to tei-embedding-server to avoid confusion.

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds new functionality)
  • [ ] Breaking change (fix or feature that would break existing design and interface)
  • [ ] Others (enhancement, documentation, validation, etc.)

Dependencies

NA

Tests

manually testing on Gaudi machine

louie-tsai avatar Apr 09 '25 01:04 louie-tsai

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

github-actions[bot] avatar Apr 09 '25 01:04 github-actions[bot]

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

joshuayao avatar Apr 09 '25 06:04 joshuayao

Hi @louie-tsai Thanks for your contribution. OPEA v1.3 is now feature-frozen, so could we target this feature for v1.4 instead?

sure. thanks for reminding.

louie-tsai avatar Apr 09 '25 14:04 louie-tsai

@louie-tsai

Please help to resolve the conflicts.

xiguiw avatar May 16 '25 01:05 xiguiw

Hi @louie-tsai thank you for the PR, everthing almost ready, please help to resolve the confict.

yinghu5 avatar Jun 12 '25 03:06 yinghu5

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

CICD-at-OPEA avatar Jul 13 '25 22:07 CICD-at-OPEA

This PR was closed because it has been stalled for 7 days with no activity.

CICD-at-OPEA avatar Jul 20 '25 22:07 CICD-at-OPEA