text-generation-inference ExllamaV2 continuous batching supported?

ExllamaV2 continuous batching supported?

Open SinanAkkoyun opened this issue 11 months ago • 1 comments

Feature request

This is just a question (if no, then it is a request): Does Exllama V2 support have continuous batching? That's the only thing I find missing in all Exllama backends

Motivation

better servability

Your contribution

I tried to work on continuous batching at the exl2 repo but I couldn't get it to work

Mar 08 '24 23:03 SinanAkkoyun

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 08 '24 01:04 github-actions[bot]

also interested in the above @SinanAkkoyun if we could get this it would be great for scalable production inferencing

May 10 '24 15:05 clintonruairi

I think vLLM is trying to implement that

May 10 '24 16:05 SinanAkkoyun

text-generation-inference text-generation-inference copied to clipboard

ExllamaV2 continuous batching supported?

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard