text-embeddings-inference Why two batching

Why two batching_task are Required?

Open sulude opened this issue 1 year ago • 1 comments

trafficstars

Feature request

In the concurrent scenario, I tried to reduce a batching_task, and the batchsize of each embed is larger, so that the inference performance is better.In the single-concurrency scenario, the performance does not decrease.

Motivation

Improves inference performance in concurrent scenarios.

Your contribution

Only one batching_task is required.

Apr 28 '24 01:04 sulude

We use two batching tasks to prefetch. This could be removed by allowing to move the backend to move the tensors to the device asynchronously but this is a simple workaround.

Jun 17 '24 14:06 OlivierDehaene

text-embeddings-inference text-embeddings-inference copied to clipboard

Why two batching_task are Required?

Feature request

Motivation

Your contribution

text-embeddings-inference
text-embeddings-inference copied to clipboard