dd-trace-py
dd-trace-py copied to clipboard
feat(llmobs): submit span events for the langchain integration
Summary
This PR makes the LangChain integration submit LLMObs Span Events for LLM and chat model calls if LLMObs is enabled, and if the LLMObs span is sampled. It accomplishes this by:
- Setting the span type
SpanTypes.LLMonlangchainAPM spans so they are properly processed by the trace processor service in theLLMObsservice - Tagging each (in this case, llm or chat model) span with additional
_ml_obs.*tags, which get popped from the trace when submitting the data they represent to LLMObs intake through the LLMObs writer
This PR is the first PR of three separate PRs for fully supporting sending span events from the LangChain integration. The following PRs are a WIP and will be opened shortly:
- Submitting span events from chains (might require a bit more work)
- Supporting streaming for the LangChain integration, as subsequently, making sure those submit span events too (the latter part might not require as much work as the former)
For Reviewers
Most of the files touched are snapshot files to account for the span type being changed. Feel free to ignore these files (everything else is relevant for review).
Additionally, no release notes/changelog, as this is an internal change for submitting span events to LLMObs intake.
Checklist
- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included in the PR
- [x] Risks are described (performance impact, potential for breakage, maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Library release note guidelines are followed or label
changelog/no-changelogis set - [x] Documentation is included (in-code, generated user docs, public corp docs)
- [x] Backport labels are set (if applicable)
- [x] If this PR changes the public interface, I've notified
@DataDog/apm-tees. - [x] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.
Reviewer Checklist
- [ ] Title is accurate
- [ ] All changes are related to the pull request's stated goal
- [ ] Description motivates each change
- [ ] Avoids breaking API changes
- [ ] Testing strategy adequately addresses listed risks
- [ ] Change is maintainable (easy to change, telemetry, documentation)
- [ ] Release note makes sense to a user of the library
- [ ] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
- [ ] Backport labels are set in a manner that is consistent with the release branch maintenance policy
Datadog Report
Branch report: sabrenner/langchain-span-events-llm-chatmodel
Commit report: 2d63332
Test service: dd-trace-py
:white_check_mark: 0 Failed, 817 Passed, 2316 Skipped, 17m 15.29s Total duration (1h 2m 8.88s time saved)
Benchmarks
Benchmark execution time: 2024-03-25 18:23:23
Comparing candidate commit 40664fc62527f13677143ba31673b899773e8917 in PR branch sabrenner/langchain-span-events-llm-chatmodel with baseline commit 805b357286473dae3a3cdea8a11d7555af4bfc9b in branch main.
Found 1 performance improvements and 4 performance regressions! Performance is the same for 196 metrics, 9 unstable metrics.
scenario:flasksimple-appsec-telemetry
- 🟥
execution_time[+220.102µs; +264.170µs] or [+3.485%; +4.183%]
scenario:flasksimple-debugger
- 🟥
execution_time[+348.207µs; +393.956µs] or [+5.533%; +6.260%]
scenario:httppropagationextract-invalid_trace_id_header
- 🟩
max_rss_usage[-816.862KB; -735.931KB] or [-3.730%; -3.360%]
scenario:httppropagationextract-wsgi_large_valid_headers_all
- 🟥
max_rss_usage[+502.809KB; +764.084KB] or [+2.381%; +3.618%]
scenario:httppropagationextract-wsgi_medium_valid_headers_all
- 🟥
max_rss_usage[+606.788KB; +746.121KB] or [+2.873%; +3.533%]