dd-trace-py feat(llmobs): track prompt caching for anthropic sdk

Tracks number of tokens read from and written to the prompt cache for anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

anthropic returns cache_creation/read_input_tokens in their usage field.

We map these to cache_write/read_input_tokens keys in our metrics field.

Testing is blocked on https://github.com/DataDog/dd-apm-test-agent/pull/217

implementation note

Right now, we are using get_llmobs_metrics_tags to set metrics for anthropic, which depends on using set_metric and get_metric. We do not want to continue this pattern for prompt caching, so we instead directly extract it out from response.usagefield.

The caveat is that for the streamed case, the usage field is a dictionary that is manually constructed by us when parsing out streamed chunks

Follow ups

Move all the unit tests to use llmobs_events fixture
De-couple metrics parsing from set/get metrics completely

Checklist

[ ] PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

[ ] Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

Jun 24 '25 16:06 lievan

CODEOWNERS have been resolved as:

releasenotes/notes/ant-p-cache-3d4001a431cedd67.yaml                    @DataDog/apm-python
tests/contrib/anthropic/cassettes/anthropic_completion_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_cache_write.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_write.yaml  @DataDog/ml-observability
ddtrace/contrib/internal/anthropic/_streaming.py                        @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py                               @DataDog/ml-observability
tests/contrib/anthropic/test_anthropic_llmobs.py                        @DataDog/ml-observability

Jun 24 '25 16:06 github-actions[bot]

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 275 ± 4 ms.

The average import time from base is: 281 ± 4 ms.

The import time difference between this PR and base is: -5.1 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.349 ms (0.85%)

ddtrace.bootstrap.sitecustomize 1.667 ms (0.61%)

ddtrace.bootstrap.preload 1.547 ms (0.56%)

ddtrace.internal.remoteconfig.client 0.705 ms (0.26%)

ddtrace.appsec._common_module_patches 0.120 ms (0.04%)

ddtrace.appsec._asm_request_context 0.120 ms (0.04%)

ddtrace.appsec._utils 0.120 ms (0.04%)

ddtrace 0.682 ms (0.25%)

ddtrace.internal._unpatched 0.034 ms (0.01%)

json 0.034 ms (0.01%)

json.decoder 0.034 ms (0.01%)

re 0.034 ms (0.01%)

enum 0.034 ms (0.01%)

types 0.034 ms (0.01%)

Jun 24 '25 16:06 github-actions[bot]

Benchmarks

Benchmark execution time: 2025-07-04 19:17:10

Comparing candidate commit 43deda5e4f2e09f1e7b3bfd8eae76325d06c6463 in PR branch evan.li/anthropic-prompt-caching with baseline commit a8419a40fe9e73e0a84c4cab53094c384480a5a6 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 546 metrics, 3 unstable metrics.

scenario:iastaspectsospath-ospathsplitdrive_aspect

🟥 execution_time [+262.999ns; +374.999ns] or [+7.171%; +10.225%]

Jun 24 '25 17:06 pr-commenter[bot]