dd-trace-py
dd-trace-py copied to clipboard
feat(llmobs): track prompt caching for anthropic sdk
Tracks number of tokens read from and written to the prompt cache for anthropic
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
anthropic returns cache_creation/read_input_tokens in their usage field.
We map these to cache_write/read_input_tokens keys in our metrics field.
Testing is blocked on https://github.com/DataDog/dd-apm-test-agent/pull/217
implementation note
Right now, we are using get_llmobs_metrics_tags to set metrics for anthropic, which depends on using set_metric and get_metric. We do not want to continue this pattern for prompt caching, so we instead directly extract it out from response.usagefield.
The caveat is that for the streamed case, the usage field is a dictionary that is manually constructed by us when parsing out streamed chunks
Follow ups
- Move all the unit tests to use
llmobs_eventsfixture - De-couple
metricsparsing from set/get metrics completely
Checklist
- [ ] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the library release note guidelines
- The change includes or references documentation updates if necessary
- Backport labels are set (if applicable)
Reviewer Checklist
- [ ] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking API changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the release branch maintenance policy
CODEOWNERS have been resolved as:
releasenotes/notes/ant-p-cache-3d4001a431cedd67.yaml @DataDog/apm-python
tests/contrib/anthropic/cassettes/anthropic_completion_cache_read.yaml @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_cache_write.yaml @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_read.yaml @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_write.yaml @DataDog/ml-observability
ddtrace/contrib/internal/anthropic/_streaming.py @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py @DataDog/ml-observability
tests/contrib/anthropic/test_anthropic_llmobs.py @DataDog/ml-observability
Bootstrap import analysis
Comparison of import times between this PR and base.
Summary
The average import time from this PR is: 275 ± 4 ms.
The average import time from base is: 281 ± 4 ms.
The import time difference between this PR and base is: -5.1 ± 0.2 ms.
Import time breakdown
The following import paths have shrunk:
ddtrace.auto
2.349 ms
(0.85%)
ddtrace.bootstrap.sitecustomize
1.667 ms
(0.61%)
ddtrace.bootstrap.preload
1.547 ms
(0.56%)
ddtrace.internal.remoteconfig.client
0.705 ms
(0.26%)
ddtrace.appsec._common_module_patches
0.120 ms
(0.04%)
ddtrace.appsec._asm_request_context
0.120 ms
(0.04%)
ddtrace.appsec._utils
0.120 ms
(0.04%)
ddtrace
0.682 ms
(0.25%)
ddtrace.internal._unpatched
0.034 ms
(0.01%)
json
0.034 ms
(0.01%)
json.decoder
0.034 ms
(0.01%)
re
0.034 ms
(0.01%)
enum
0.034 ms
(0.01%)
types
0.034 ms
(0.01%)
Benchmarks
Benchmark execution time: 2025-07-04 19:17:10
Comparing candidate commit 43deda5e4f2e09f1e7b3bfd8eae76325d06c6463 in PR branch evan.li/anthropic-prompt-caching with baseline commit a8419a40fe9e73e0a84c4cab53094c384480a5a6 in branch main.
Found 0 performance improvements and 1 performance regressions! Performance is the same for 546 metrics, 3 unstable metrics.
scenario:iastaspectsospath-ospathsplitdrive_aspect
- 🟥
execution_time[+262.999ns; +374.999ns] or [+7.171%; +10.225%]