text-generation-inference
text-generation-inference copied to clipboard
ExllamaV2 continuous batching supported?
Feature request
This is just a question (if no, then it is a request): Does Exllama V2 support have continuous batching? That's the only thing I find missing in all Exllama backends
Motivation
better servability
Your contribution
I tried to work on continuous batching at the exl2 repo but I couldn't get it to work
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
also interested in the above @SinanAkkoyun if we could get this it would be great for scalable production inferencing
I think vLLM is trying to implement that