lorax
lorax copied to clipboard
Include total time to generate tokens in final payload details
trafficstars
Feature request
When streaming a prompt response, the last message does not include the time to process the request. Would like to request that we include that information in the details. For example:
data:{"token":{"id":13,"text":"\n","logprob":0.0,"special":false},"generated_text":"\n[INST] ... \n","details":{"finish_reason":"length","prompt_tokens":136,"generated_tokens":272,"seed":11081146212971147995}}
Motivation
When trying to understand why a request is slow, it's very hard to pinpoint network latency versus model size without a lot of instrumentation around LoRAX.
Your contribution
Yes