lorax Include total time to generate tokens in final payload details

Include total time to generate tokens in final payload details

Open martindavis opened this issue 1 year ago • 1 comments

trafficstars

Feature request

When streaming a prompt response, the last message does not include the time to process the request. Would like to request that we include that information in the details. For example:

data:{"token":{"id":13,"text":"\n","logprob":0.0,"special":false},"generated_text":"\n[INST]  ... \n","details":{"finish_reason":"length","prompt_tokens":136,"generated_tokens":272,"seed":11081146212971147995}}

Motivation

When trying to understand why a request is slow, it's very hard to pinpoint network latency versus model size without a lot of instrumentation around LoRAX.

Your contribution

Yes

Mar 13 '24 20:03 martindavis

lorax lorax copied to clipboard

Include total time to generate tokens in final payload details

Feature request

Motivation

Your contribution

lorax
lorax copied to clipboard