OpenLLM icon indicating copy to clipboard operation
OpenLLM copied to clipboard

feat: Response streaming over gRPC

Open Bec-k opened this issue 2 years ago • 1 comments

Feature request

Would be nice to have a streaming feature for generation API, so that response would stream token per token and won't wait until full response is generated. gRPC have built-in support for streaming responses, proto code generation also does that. Only work is required in your server, to pipe tokens into the stream.

Motivation

This feature would allow to stream response while it is generating, instead of waiting until it is fully generated.

Other

No response

Bec-k avatar Jun 23 '23 10:06 Bec-k