text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Support for returning a `CompletionUsage` object when `streaming=True`

Open andrewrreed opened this issue 5 months ago • 0 comments

Feature request

OpenAI's /chat/completions endpoint has the option of returning a CompletionUsage object when streaming responses by passing an additional arg stream_options={"include_usage": True}.

It looks like TGI's implementation doesn't currently support this, however it is returned by default in TGI when streaming=False.

More details on the feature here

Motivation

This would be helpful to have parity with OAI functionality

Your contribution

N/A

andrewrreed avatar Sep 17 '24 21:09 andrewrreed