dd-trace-py feat(llmobs): annotate `toolResult` input on bedrock converse spans correctly

Makes sure bedrock converse toolResult input content blocks are properly annotated on input to the LLM call. We override the role field to explicitly be "Tool Result".

Before Screenshot 2025-05-15 at 4 03 25 PM

After Screenshot 2025-05-15 at 4 04 03 PM

When asked "what is 5 + 2" with an addition tool, and submitting the follow-up message after the tool call to bedrock converse as

tool = response['output']['message']['content'][1]['toolUse']
tool_name = tool['name']
tool_input = tool['input']

tool_result = tools[tool_name](**tool_input)

messages.append({
  'role': 'user',
  'content': [
    {
      'toolResult': {
        'toolUseId': tool['toolUseId'],
        'content': [
          {
            'text': str(tool_result)
          }
        ]
      }
    }
  ]
})

response = bedrock_client.converse(
  modelId='anthropic.claude-3-5-sonnet-20240620-v1:0',
  messages=messages,
  inferenceConfig={
    "temperature": 0.5,
  },
  toolConfig={"tools": [add_tool]}
)

MLOB-2762

Checklist

[x] PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

[x] Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

May 15 '25 20:05 sabrenner

CODEOWNERS have been resolved as:

releasenotes/notes/bedrock-converse-tool-result-annotations-d454d6df496bba68.yaml  @DataDog/apm-python
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability

May 15 '25 20:05 github-actions[bot]

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 236 ± 4 ms.

The average import time from base is: 241 ± 4 ms.

The import time difference between this PR and base is: -5.1 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.335 ms (0.99%)

ddtrace.bootstrap.sitecustomize 1.654 ms (0.70%)

ddtrace.bootstrap.preload 1.654 ms (0.70%)

ddtrace.internal.remoteconfig.client 0.802 ms (0.34%)

ddtrace 0.680 ms (0.29%)

May 15 '25 20:05 github-actions[bot]

Benchmarks

Benchmark execution time: 2025-05-19 16:10:44

Comparing candidate commit 92cee5dde13ea0d3b74a7d5bc9ed05063be4ab1a in PR branch sabrenner/bedrock-converse-tool-input with baseline commit e54c5bc9a3da6621755411726ec808c87d262986 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 385 metrics, 5 unstable metrics.

May 15 '25 20:05 pr-commenter[bot]

This pull request has been automatically closed after a period of inactivity. After this much time, it will likely be easier to open a new pull request with the same changes than to update this one from the base branch. Please comment or reopen if you think this pull request was closed in error.

Jun 21 '25 00:06 github-actions[bot]

getting back to this work now, will open a different PR so i don't have to deal with rebase mess

Jul 14 '25 13:07 sabrenner

dd-trace-py dd-trace-py copied to clipboard

feat(llmobs): annotate `toolResult` input on bedrock converse spans correctly

Checklist

Reviewer Checklist

Bootstrap import analysis

Summary

Import time breakdown

Benchmarks

dd-trace-py
dd-trace-py copied to clipboard