text-generation-inference Support for returning a `CompletionUsage` object when `streaming=True`

Support for returning a `CompletionUsage` object when `streaming=True`

Open andrewrreed opened this issue 5 months ago • 0 comments

Feature request

OpenAI's /chat/completions endpoint has the option of returning a CompletionUsage object when streaming responses by passing an additional arg stream_options={"include_usage": True}.

It looks like TGI's implementation doesn't currently support this, however it is returned by default in TGI when streaming=False.

More details on the feature here

Motivation

This would be helpful to have parity with OAI functionality

Your contribution

N/A

Sep 17 '24 21:09 andrewrreed

text-generation-inference text-generation-inference copied to clipboard

Support for returning a `CompletionUsage` object when `streaming=True`

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard