dd-trace-py feat(llmobs): submit span events for the langchain integration

trafficstars

Summary

This PR makes the LangChain integration submit LLMObs Span Events for LLM and chat model calls if LLMObs is enabled, and if the LLMObs span is sampled. It accomplishes this by:

Setting the span type SpanTypes.LLM on langchain APM spans so they are properly processed by the trace processor service in the LLMObs service
Tagging each (in this case, llm or chat model) span with additional _ml_obs.* tags, which get popped from the trace when submitting the data they represent to LLMObs intake through the LLMObs writer

This PR is the first PR of three separate PRs for fully supporting sending span events from the LangChain integration. The following PRs are a WIP and will be opened shortly:

Submitting span events from chains (might require a bit more work)
Supporting streaming for the LangChain integration, as subsequently, making sure those submit span events too (the latter part might not require as much work as the former)

For Reviewers

Most of the files touched are snapshot files to account for the span type being changed. Feel free to ignore these files (everything else is relevant for review).

Additionally, no release notes/changelog, as this is an internal change for submitting span events to LLMObs intake.

Checklist

[x] Change(s) are motivated and described in the PR description
[x] Testing strategy is described if automated tests are not included in the PR
[x] Risks are described (performance impact, potential for breakage, maintainability)
[x] Change is maintainable (easy to change, telemetry, documentation)
[x] Library release note guidelines are followed or label changelog/no-changelog is set
[x] Documentation is included (in-code, generated user docs, public corp docs)
[x] Backport labels are set (if applicable)
[x] If this PR changes the public interface, I've notified @DataDog/apm-tees.
[x] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

[ ] Title is accurate
[ ] All changes are related to the pull request's stated goal
[ ] Description motivates each change
[ ] Avoids breaking API changes
[ ] Testing strategy adequately addresses listed risks
[ ] Change is maintainable (easy to change, telemetry, documentation)
[ ] Release note makes sense to a user of the library
[ ] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
[ ] Backport labels are set in a manner that is consistent with the release branch maintenance policy

Mar 15 '24 13:03 sabrenner

Datadog Report

Branch report: sabrenner/langchain-span-events-llm-chatmodel Commit report: 2d63332 Test service: dd-trace-py

:white_check_mark: 0 Failed, 817 Passed, 2316 Skipped, 17m 15.29s Total duration (1h 2m 8.88s time saved)

Mar 15 '24 13:03 datadog-dd-trace-py-rkomorn[bot]

Benchmarks

Benchmark execution time: 2024-03-25 18:23:23

Comparing candidate commit 40664fc62527f13677143ba31673b899773e8917 in PR branch sabrenner/langchain-span-events-llm-chatmodel with baseline commit 805b357286473dae3a3cdea8a11d7555af4bfc9b in branch main.

Found 1 performance improvements and 4 performance regressions! Performance is the same for 196 metrics, 9 unstable metrics.

scenario:flasksimple-appsec-telemetry

🟥 execution_time [+220.102µs; +264.170µs] or [+3.485%; +4.183%]

scenario:flasksimple-debugger

🟥 execution_time [+348.207µs; +393.956µs] or [+5.533%; +6.260%]

scenario:httppropagationextract-invalid_trace_id_header

🟩 max_rss_usage [-816.862KB; -735.931KB] or [-3.730%; -3.360%]

scenario:httppropagationextract-wsgi_large_valid_headers_all

🟥 max_rss_usage [+502.809KB; +764.084KB] or [+2.381%; +3.618%]

scenario:httppropagationextract-wsgi_medium_valid_headers_all

🟥 max_rss_usage [+606.788KB; +746.121KB] or [+2.873%; +3.533%]

Mar 15 '24 14:03 pr-commenter[bot]

dd-trace-py dd-trace-py copied to clipboard

feat(llmobs): submit span events for the langchain integration

Summary

For Reviewers

Checklist

Reviewer Checklist

Datadog Report

Benchmarks

scenario:flasksimple-appsec-telemetry

scenario:flasksimple-debugger

scenario:httppropagationextract-invalid_trace_id_header

scenario:httppropagationextract-wsgi_large_valid_headers_all

scenario:httppropagationextract-wsgi_medium_valid_headers_all

dd-trace-py
dd-trace-py copied to clipboard