text-generation-inference
text-generation-inference copied to clipboard
Support for returning a `CompletionUsage` object when `streaming=True`
Feature request
OpenAI's /chat/completions
endpoint has the option of returning a CompletionUsage
object when streaming responses by passing an additional arg stream_options={"include_usage": True}
.
It looks like TGI's implementation doesn't currently support this, however it is returned by default in TGI when streaming=False
.
More details on the feature here
Motivation
This would be helpful to have parity with OAI functionality
Your contribution
N/A