goose icon indicating copy to clipboard operation
goose copied to clipboard

Report tokens per second for input/output

Open iandouglas opened this issue 3 months ago • 3 comments

Please explain the motivation behind the feature request.

Not sure this is even possible with all LLMs, but could we have a setting that can hide/expose tokens-per-second when Goose communicates with an LLM if it is available? I think it'd be a valuable tool to help users evaluate how they want to balance their LLM choice between speed and performance.

Describe the solution you'd like

I know the idea of 'tps' is a bit subjective/reductive, and may not be a super valuable tool other than for those trying to squeeze out every bit of performance they can. I have this turned on in LM Studio on a new AI rig I set up and it's really nice to see the different in processing speed between different models. Since my AI rig is a different system than the ones I use Goose on, I'd love to have a way to send that data through to Goose.

I did see an old PR, https://github.com/block/goose/pull/2404 that introduced a tokens_per_second setup but I don't think I've seen that in the UI or CLI anywhere.

Describe alternatives you've considered

none

Additional context

n/a

  • [x] I have verified this does not duplicate an existing feature request

iandouglas avatar Sep 02 '25 19:09 iandouglas

Are you imagining this as session scoped, so you'd see your average tps in that session?

Or looking to learn more aggregate states of models you commonly use?

katzdave avatar Oct 29 '25 15:10 katzdave

Given how goose has the multi-model setup now within a single session, I'd personally want to track it across sessions. And not just an average, I'd like to see cumulative totals, and let the client side use that data however it sees fit (calculating costs, running averages, etc)

iandouglas avatar Oct 29 '25 16:10 iandouglas

Cool got it.

Seems like next step is to store persistently for every provider request: (provider, model, input_tokens, output_tokens, duration, date). Anything missing?

From there we can build a page to surface/navigate that. Cost can get pulled in at the end based on current cost of that model (or user can customize if we don't have info about that provider/model).

katzdave avatar Oct 31 '25 18:10 katzdave