OpenLLM
OpenLLM copied to clipboard
feat: Response streaming over gRPC
Feature request
Would be nice to have a streaming feature for generation API, so that response would stream token per token and won't wait until full response is generated. gRPC have built-in support for streaming responses, proto code generation also does that. Only work is required in your server, to pipe tokens into the stream.
Motivation
This feature would allow to stream response while it is generating, instead of waiting until it is fully generated.
Other
No response