dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

feat(llmobs): track prompt caching for anthropic sdk

Open lievan opened this issue 5 months ago • 3 comments

Tracks number of tokens read from and written to the prompt cache for anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

anthropic returns cache_creation/read_input_tokens in their usage field.

We map these to cache_write/read_input_tokens keys in our metrics field.

Testing is blocked on https://github.com/DataDog/dd-apm-test-agent/pull/217

implementation note

Right now, we are using get_llmobs_metrics_tags to set metrics for anthropic, which depends on using set_metric and get_metric. We do not want to continue this pattern for prompt caching, so we instead directly extract it out from response.usagefield.

The caveat is that for the streamed case, the usage field is a dictionary that is manually constructed by us when parsing out streamed chunks

Follow ups

  1. Move all the unit tests to use llmobs_events fixture
  2. De-couple metrics parsing from set/get metrics completely

Checklist

  • [ ] PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • [ ] Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

lievan avatar Jun 24 '25 16:06 lievan

CODEOWNERS have been resolved as:

releasenotes/notes/ant-p-cache-3d4001a431cedd67.yaml                    @DataDog/apm-python
tests/contrib/anthropic/cassettes/anthropic_completion_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_cache_write.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_read.yaml  @DataDog/ml-observability
tests/contrib/anthropic/cassettes/anthropic_completion_stream_cache_write.yaml  @DataDog/ml-observability
ddtrace/contrib/internal/anthropic/_streaming.py                        @DataDog/ml-observability
ddtrace/llmobs/_integrations/anthropic.py                               @DataDog/ml-observability
tests/contrib/anthropic/test_anthropic_llmobs.py                        @DataDog/ml-observability

github-actions[bot] avatar Jun 24 '25 16:06 github-actions[bot]

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 275 ± 4 ms.

The average import time from base is: 281 ± 4 ms.

The import time difference between this PR and base is: -5.1 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.349 ms (0.85%)
ddtrace.bootstrap.sitecustomize 1.667 ms (0.61%)
ddtrace.bootstrap.preload 1.547 ms (0.56%)
ddtrace.internal.remoteconfig.client 0.705 ms (0.26%)
ddtrace.appsec._common_module_patches 0.120 ms (0.04%)
ddtrace.appsec._asm_request_context 0.120 ms (0.04%)
ddtrace.appsec._utils 0.120 ms (0.04%)
ddtrace 0.682 ms (0.25%)
ddtrace.internal._unpatched 0.034 ms (0.01%)
json 0.034 ms (0.01%)
json.decoder 0.034 ms (0.01%)
re 0.034 ms (0.01%)
enum 0.034 ms (0.01%)
types 0.034 ms (0.01%)

github-actions[bot] avatar Jun 24 '25 16:06 github-actions[bot]

Benchmarks

Benchmark execution time: 2025-07-04 19:17:10

Comparing candidate commit 43deda5e4f2e09f1e7b3bfd8eae76325d06c6463 in PR branch evan.li/anthropic-prompt-caching with baseline commit a8419a40fe9e73e0a84c4cab53094c384480a5a6 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 546 metrics, 3 unstable metrics.

scenario:iastaspectsospath-ospathsplitdrive_aspect

  • 🟥 execution_time [+262.999ns; +374.999ns] or [+7.171%; +10.225%]

pr-commenter[bot] avatar Jun 24 '25 17:06 pr-commenter[bot]