text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

ExllamaV2 continuous batching supported?

Open SinanAkkoyun opened this issue 11 months ago • 1 comments

Feature request

This is just a question (if no, then it is a request): Does Exllama V2 support have continuous batching? That's the only thing I find missing in all Exllama backends

Motivation

better servability

Your contribution

I tried to work on continuous batching at the exl2 repo but I couldn't get it to work

SinanAkkoyun avatar Mar 08 '24 23:03 SinanAkkoyun

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 08 '24 01:04 github-actions[bot]

also interested in the above @SinanAkkoyun if we could get this it would be great for scalable production inferencing

clintonruairi avatar May 10 '24 15:05 clintonruairi

I think vLLM is trying to implement that

SinanAkkoyun avatar May 10 '24 16:05 SinanAkkoyun