dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

fix(llmobs): fix span linking for langgraph nodes

Open sabrenner opened this issue 7 months ago • 3 comments

Fix for langgraph>=0.3.22. This includes expanding our testing range for langgraph, and it was determined we support as low as langgraph==0.2.23.

This is a chore as the feature this fix is powering is not public yet, and was otherwise not throwing any errors.

Most LOC are from riot requirements fixes.

Below is some additional context for how we do span linking/fix span linking for LangGraph. There are two other small fixes:

  1. Subgraph detection was previously done by checking node names ("LangGraph" equality). Since (sub)graphs can be names, this is not viable, and instead we opt for an instance check.
  2. For using the Command (ref) and goto features in LangGraph will throw a ParentCommand exception to communicate to the parent graph. This is not a true error, so any instances of this error should not be tagged as such.

MLOB-2774

Context

To properly link LangGraph nodes together, we need to know what node(s) were responsible for its specific invocation. LangGraph does not provide this OOTB - while each node (aka task) has an associated id, it does not have a field with its parent ID.

The way LangGraph determines which tasks come next by queueing up tasks based on completed tasks during each "tick" of the graph. Graph execution runs on a loop (that can be interrupted), and each loop execution runs a tick that looks at the tasks completed in the last tick to queue up tasks in the current tick.

How are the next tasks computed?

This process of determining next tasks are computed by tasks communicating with each other via channels. Channels are ephemeral listeners set up for the duration of a graph's execution. When a node gets executed, the underlying user function does two things - updates the graph's state, and then deterministically branches to some other set of node(s), either by explicit definition when defining the graph (ie, node a branches off to node b always), or on the condition of the state from a given function (ie, node a branches off to node b if the state has a key-value pair which:b, otherwise goes to node c).

Updating the graph state writes to a channel named by that state’s key name. Branching off to different node(s) is done by writing to a channel denoted branch:to:{node_name} (e.g. branch:to:b, branch:to:c). The one exception is when a node has a conditional edge that writes to a different node with a Send command, which writes to a __pregel_tasks channel.

When iterating over finished tasks, LangGraph consumes these writes for each finished task. For example, if node a branches off to nodes b and c, and sets some state value a_list: [“a”]. Its task writes (which, for a given task, can be accessed with task.writes) would look like:

deque([("a_list", ["a"]), ("branch:to:b", None), ("branch:to:c", None)])

Where each element in the list is a tuple of the channel name to write to, and the value to write to it. Updating the state requires writing the new value, and writing to the nodes do not require sending any additional data (they will pull the updated state when being invoked). For tasks queued up via Send, it would look like:

deque([("a_list", ["a"]), ("__pregel_tasks", Send(node="b", arg=...))])

Where the Send is written to the __pregel_tasks channel, when the Send value, which has the node name and arguments to send to it (since these can be generic, it is required to pass in the value, likely the state, to the Send command).

LangGraph uses these channel writes, called triggers, to queue the next tasks. Each of these tasks have associated triggers. The task will be defined as follows:

task=Task(name="b", triggers=[("branch:to:b",)])

Meaning that each task will state which channel writes triggered their execution (typically, branch:to:{node_name}, although they can take any form).

Tasks that are invoked to from __pregel_tasks writes have __pregel_push as one of their triggers

task=Task(name="b", triggers=[("__pregel_push",)])

How can we use those properties to create the span links?

Given a dictionary of finished task IDs to the tasks they represent, and a dictionary of next task IDs to the tasks they represent, how can we use the properties listed above to create span links?

Creating a map of channel writes to finished task IDs

The first step is to create a dictionary of channel writes mapping to a list of finished task IDs that wrote to them. This is done by iterating over all of the finished tasks, and for each finished task, appending its task ID to an entry in the dictionary denoted by each of the channel names it writes to.

The caveat is Send writes. These will all fall under the same __pregel_tasks channel name, and in addition to appending the finished task ID that wrote to that channel, we also want to include the node name it was writing to (Send.node).

Iterating over all queued tasks and linking them

For each queued task, we’ll iterate over its triggers. If a trigger is not a __pregel_push, we’ll index into the map created for channel writes to finished task IDs and extend a list of parent IDs for that given task. If the trigger is a __pregel_push, we’ll grab the first entry in the map at the __pregel_tasks key that has a finished task with a node name from the Send object that is the same as the current tasks name. We’ll pop it from the list, as each Send write can only be used once.

For each of the parent IDs identified, we’ll record an output → input link for the task. When we make a span for that task, we’ll apply those span links to the span.

Iterating over all unused finished task IDs

It’s possible that some of the finished tasks will not be used as parents to the queued tasks. In that case, we’ll establish output → output links from the spans representing those tasks/nodes to the outer graph span (LangGraph invocation)

Checklist

  • [x] PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • [x] Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

sabrenner avatar May 14 '25 20:05 sabrenner

CODEOWNERS have been resolved as:

.riot/requirements/11029c2.txt                                          @DataDog/apm-python
.riot/requirements/125dc45.txt                                          @DataDog/apm-python
.riot/requirements/158a309.txt                                          @DataDog/apm-python
.riot/requirements/1775c38.txt                                          @DataDog/apm-python
.riot/requirements/18eaa2d.txt                                          @DataDog/apm-python
.riot/requirements/1d59ef9.txt                                          @DataDog/apm-python
.riot/requirements/1d7357f.txt                                          @DataDog/apm-python
.riot/requirements/1ea3336.txt                                          @DataDog/apm-python
.riot/requirements/1f9e62b.txt                                          @DataDog/apm-python
.riot/requirements/351e483.txt                                          @DataDog/apm-python
.riot/requirements/91c9dfa.txt                                          @DataDog/apm-python
.riot/requirements/a331a88.txt                                          @DataDog/apm-python
.riot/requirements/a950a7b.txt                                          @DataDog/apm-python
.riot/requirements/aa478d5.txt                                          @DataDog/apm-python
.riot/requirements/d65ebe1.txt                                          @DataDog/apm-python
.riot/requirements/e5ffdaa.txt                                          @DataDog/apm-python
.riot/requirements/f1266b6.txt                                          @DataDog/apm-python
.riot/requirements/f25aee2.txt                                          @DataDog/apm-python
.riot/requirements/f46c97a.txt                                          @DataDog/apm-python
.riot/requirements/f535f64.txt                                          @DataDog/apm-python
releasenotes/notes/llmobs-fix-langgraph-span-linking-900f455f2e2b17df.yaml  @DataDog/apm-python
ddtrace/contrib/internal/langgraph/patch.py                             @DataDog/ml-observability
ddtrace/llmobs/_integrations/langgraph.py                               @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/contrib/langgraph/conftest.py                                     @DataDog/ml-observability
tests/contrib/langgraph/test_langgraph_llmobs.py                        @DataDog/ml-observability

github-actions[bot] avatar May 14 '25 20:05 github-actions[bot]

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 289 ± 5 ms.

The average import time from base is: 290 ± 6 ms.

The import time difference between this PR and base is: -1.0 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.907 ms (0.66%)
ddtrace.bootstrap.sitecustomize 1.228 ms (0.42%)
ddtrace.bootstrap.preload 1.228 ms (0.42%)
ddtrace.internal.remoteconfig.client 0.641 ms (0.22%)
ddtrace 0.679 ms (0.23%)
ddtrace.internal._unpatched 0.031 ms (0.01%)
json 0.031 ms (0.01%)
json.decoder 0.031 ms (0.01%)
re 0.031 ms (0.01%)
enum 0.031 ms (0.01%)
types 0.031 ms (0.01%)

github-actions[bot] avatar May 14 '25 20:05 github-actions[bot]

Benchmarks

Benchmark execution time: 2025-07-02 21:25:39

Comparing candidate commit bd95c597a527c14244e8c485f43d3a7dd3cfa931 in PR branch sabrenner/langgraph-linking-bug-fix with baseline commit 8d3e7780d35f7f821699cc907f6e348268b6866a in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 510 metrics, 3 unstable metrics.

scenario:iastaspectsospath-ospathsplitext_aspect

  • 🟥 execution_time [+783.831ns; +842.500ns] or [+17.099%; +18.379%]

pr-commenter[bot] avatar May 14 '25 21:05 pr-commenter[bot]