[WIP] Using attributes and refs for chat history on gen_ai spans
This is an early draft of https://github.com/open-telemetry/semantic-conventions/issues/2010#issuecomment-2766883695
Key points:
- System message is recorded as
gen_ai.system.instructionsattribute. It's not necessarily a part of chat history (when dealing with server-side agents) - Prompt is recorded as
gen_ai.input.messagesattribute, format is defined in json schema (for now until we have tooling to define complex attributes). If uploading to an external storage, it's replaced withgen_ai.input.messages_ref - Completion is recorded as
gen_ai.output.messages(orgen_ai.output.messages_ref) - We still define corresponding events, but they record all input and output in model-specific format in the body. This makes them self-sufficient and more suitable for audit/compliance than hand-crafted and unified input/output messages.
Unfortunately had to skip today's meeting because of a conflict. Wanted to bring up some cases for sending the prompts and messages as events:
- Some customers want to send this customer-sensitive data to a different backend, for compliance reasons.
- this data is useful independently from application tracing or metrics, as it provides an option to build a conversation-specific tools and UIs.
- tracing usually implemented with different backend technologies
All of this is specifically important for non-purposely built solution, but for a bigger observability platforms.
@lmolkova My proposal would be to keep the events optional, and keep them in the existing implementations enabled with an environment variable.
@zhirafovod
Unfortunately had to skip today's meeting because of a conflict. Wanted to bring up some cases for sending the prompts and messages as events:
- Some customers want to send this customer-sensitive data to a different backend, for compliance reasons.
- this data is useful independently from application tracing or metrics, as it provides an option to build a conversation-specific tools and UIs.
- tracing usually implemented with different backend technologies
All of this is specifically important for non-purposely built solution, but for a bigger observability platforms.
We do have an option to record chat history on events, please take a look!
@open-telemetry/semconv-genai-approvers this PR is now ready for the review! Please take a look
It depends on #2046.
Thanks, @lmolkova!
I believe I understand the motivation for having a separate place to identify request-level instructions, as that's becoming common-place with more agentic APIs. But I'm not following why it's ok to pull out system instructions that are part of the message list. A message list may contain multiple system (or developer, in the case of some OpenAI models) messages, and they can be spread out throughout the message list, with placement mattering. I'm unclear how that can be accurately represented with a single top-level instructions attribute.
Why not just have both?
I believe I understand the motivation for having a separate place to identify request-level instructions, as that's becoming common-place with more agentic APIs. But I'm not following why it's ok to pull out system instructions that are part of the message list. A message list may contain multiple system (or developer, in the case of some OpenAI models) messages, and they can be spread out throughout the message list, with placement mattering. I'm unclear how that can be accurately represented with a single top-level instructions attribute.
Why not just have both?
@stephentoub great point. I did not consider that chat history may contain multiple system messages. I agree we should have both. I removed system message exclusion from input messages and changed gen_ai.system.instructions to record instructions that are provided separately from the chat history.
@copilot did you use custom copilot-instructions.md to review this PR?
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: lmolkova / name: Liudmila Molkova (76eb5b901188d262e4131e1ad7523f7fb5187f96, 50f95437276cbfcfcd20ccdff00454240d491f14, 5d9d4028ab2c36452afcef6a309b62393a1e92ee, eccd1f806e426a32c98271c3ce77585492d26de2, 75ee4c8f39d9fd99b859899438dd4888df6e03d9)