text-generation-inference
text-generation-inference copied to clipboard
add metadata like prompt tokens to generate_stream response
Feature request
Add metadata to generate_stream
response like headers x-prompt-tokens
, x-generated-tokens
, x-compute-time
, x-total-time
, x-validation-time
, x-queue-time
, x-inference-time
, x-time-per-token
in generate
response.
Motivation
We need to report metrics like tokens, computing time when calling generate_stream
. It'll be convenient if generate_stream
response contains the data.
Your contribution
I'll try to submit a PR when I have time.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
+1
Probably not trivial as the response header is returned immediately when using streaming?
import requests
session = requests.Session()
url = "http://0.0.0.0:80/generate_stream"
data = {"inputs": "Today I am in Paris and", "parameters": {"max_new_tokens": 20}}
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=data, headers=headers)
response = session.post(
url,
json=data,
headers=headers,
stream=True,
)
for line in response.iter_lines():
print(f"line: `{line}`")
print(response.headers)
relevant code https://github.com/huggingface/text-generation-inference/blob/a25737139d302390edc40ee2d9d92109c7720c04/router/src/server.rs#L663
cc @Narsil design-wise, is this feasible?
@fxmarty Maybe return these data in response body will be better, like this
{
"metadata": {
"tokenMetadata": {
"inputTokenCount": {
"totalTokens":
},
"outputTokenCount": {
"totalTokens":
}
}
}
}