semantic-conventions Proposal: chunk streaming and LLM provider latency metrics

Proposal: chunk streaming and LLM provider latency metrics

Open castlenthesky opened this issue 1 week ago • 0 comments

Area(s)

area:gen-ai

Propose new conventions

The current gen_ai.server.time_to_first_token metric is useful for tracking server-side latency and llm "spin-up" but this metric is not as informative for client-side optimizations.

My thought was that when instrumenting an application using an agentic framework, it would be helpful for the framework to have appropriate telemetry to answer the following questions:

How long was my request in transit to and from the LLM provider before I began seeing a response?
- I propose gen_ai.client.operation.time_to_first_chunk as a client-side version of the gen_ai.server.time_to_first_token or time to first token (TTFT) metric.
- This allows ops to measure (and ultimately optimize) overall lag/latency from the LLM providers' APIs (provisioning, message queue, etc...)
How many tokens per second were generated DURING GENERATION (not including resourcing, queues, and provisioning by the server.
- I propose gen_ai.client.operation.time_per_output_chunk as a client-side version of the gen_ai.server.time_per_output_token metric.
- This allows ops to measure (and ultimately optimize) LLM providers based on their speed/cost
How long did my request take to complete in total?
- The current gen_ai.client.operation.duration you have already implemented

For additional context, many builders are not running inference locally and likely don't have access to the server's token and chunk emission telemetry to measure directly. Considering the client-side lack of telemetry in these regards, having these metrics is valuable from an LLM ops optimization standpoint.

Tip

_{React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.}

Nov 22 '25 20:11 castlenthesky

semantic-conventions semantic-conventions copied to clipboard

Proposal: chunk streaming and LLM provider latency metrics

Area(s)

Propose new conventions

Tip

semantic-conventions
semantic-conventions copied to clipboard