semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

[WIP] Using attributes and refs for chat history on gen_ai spans

Open lmolkova opened this issue 1 year ago • 1 comments

This is an early draft of https://github.com/open-telemetry/semantic-conventions/issues/2010#issuecomment-2766883695

Key points:

  1. System message is recorded as gen_ai.system.instructions attribute. It's not necessarily a part of chat history (when dealing with server-side agents)
  2. Prompt is recorded as gen_ai.input.messages attribute, format is defined in json schema (for now until we have tooling to define complex attributes). If uploading to an external storage, it's replaced with gen_ai.input.messages_ref
  3. Completion is recorded as gen_ai.output.messages (or gen_ai.output.messages_ref)
  4. We still define corresponding events, but they record all input and output in model-specific format in the body. This makes them self-sufficient and more suitable for audit/compliance than hand-crafted and unified input/output messages.

lmolkova avatar Apr 24 '25 16:04 lmolkova

Unfortunately had to skip today's meeting because of a conflict. Wanted to bring up some cases for sending the prompts and messages as events:

  1. Some customers want to send this customer-sensitive data to a different backend, for compliance reasons.
  2. this data is useful independently from application tracing or metrics, as it provides an option to build a conversation-specific tools and UIs.
  3. tracing usually implemented with different backend technologies

All of this is specifically important for non-purposely built solution, but for a bigger observability platforms.

@lmolkova My proposal would be to keep the events optional, and keep them in the existing implementations enabled with an environment variable.

zhirafovod avatar May 08 '25 19:05 zhirafovod

@zhirafovod

Unfortunately had to skip today's meeting because of a conflict. Wanted to bring up some cases for sending the prompts and messages as events:

  1. Some customers want to send this customer-sensitive data to a different backend, for compliance reasons.
  2. this data is useful independently from application tracing or metrics, as it provides an option to build a conversation-specific tools and UIs.
  3. tracing usually implemented with different backend technologies

All of this is specifically important for non-purposely built solution, but for a bigger observability platforms.

We do have an option to record chat history on events, please take a look!

lmolkova avatar Jun 07 '25 16:06 lmolkova

@open-telemetry/semconv-genai-approvers this PR is now ready for the review! Please take a look

It depends on #2046.

lmolkova avatar Jun 07 '25 16:06 lmolkova

Thanks, @lmolkova!

I believe I understand the motivation for having a separate place to identify request-level instructions, as that's becoming common-place with more agentic APIs. But I'm not following why it's ok to pull out system instructions that are part of the message list. A message list may contain multiple system (or developer, in the case of some OpenAI models) messages, and they can be spread out throughout the message list, with placement mattering. I'm unclear how that can be accurately represented with a single top-level instructions attribute.

Why not just have both?

stephentoub avatar Jun 18 '25 19:06 stephentoub

I believe I understand the motivation for having a separate place to identify request-level instructions, as that's becoming common-place with more agentic APIs. But I'm not following why it's ok to pull out system instructions that are part of the message list. A message list may contain multiple system (or developer, in the case of some OpenAI models) messages, and they can be spread out throughout the message list, with placement mattering. I'm unclear how that can be accurately represented with a single top-level instructions attribute.

Why not just have both?

@stephentoub great point. I did not consider that chat history may contain multiple system messages. I agree we should have both. I removed system message exclusion from input messages and changed gen_ai.system.instructions to record instructions that are provided separately from the chat history.

lmolkova avatar Jun 23 '25 22:06 lmolkova

@copilot did you use custom copilot-instructions.md to review this PR?

lmolkova avatar Jul 01 '25 05:07 lmolkova

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: lmolkova / name: Liudmila Molkova (76eb5b901188d262e4131e1ad7523f7fb5187f96, 50f95437276cbfcfcd20ccdff00454240d491f14, 5d9d4028ab2c36452afcef6a309b62393a1e92ee, eccd1f806e426a32c98271c3ce77585492d26de2, 75ee4c8f39d9fd99b859899438dd4888df6e03d9)