π Bug Report: OpenAI Requests Not Traced When Sent from a LangGraph Node
Which component is this bug for?
Langchain Instrumentation
π Description
In my Langchain-based application using LangGraph, I noticed that OpenAI requests with OpenAI's own client within a LangGraph node are not traced. Specifically, when I call the OpenAI GPT-4o model from within a LangGraph node, I do not see a span related to the OpenAI call in my exported trace log, nor do I see any associated LLM call metrics.
π Reproduction steps
Here is an example demonstrating the issue:
from openai import OpenAI
from dotenv import load_dotenv
import os
from typing import TypedDict
from langgraph.graph import StateGraph
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from traceloop.sdk import Traceloop
# Load environment variables
load_dotenv()
# Setup directories for logs
logs_dir = "path/to/logs"
langtrace_logs_dir = os.path.join(logs_dir, "traceloop")
traceloop_log_file_path = os.path.join(langtrace_logs_dir, "traceloop_issue_example.log")
traceloop_log_file = open(traceloop_log_file_path, "w")
# Initialize Traceloop with ConsoleSpanExporter
exporter = ConsoleSpanExporter(out=traceloop_log_file)
Traceloop.init(disable_batch=True, exporter=exporter)
client = OpenAI()
# Define state for LangGraph
class State(TypedDict):
request: str
result: str
# Define a calculation node
def calculate(state: State):
request = state["request"]
completion = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a mathematician."},
{"role": "user", "content": request}
]
)
return {"result": completion.choices[0].message.content}
# Create the workflow graph
workflow = StateGraph(State)
workflow.add_node("calculate", calculate)
workflow.set_entry_point("calculate")
langgraph = workflow.compile()
# Invoke the graph
user_request = "What's 5 + 5?"
result = langgraph.invoke(input={"request": user_request})
print(f"Request: {user_request}")
print(f"Result: {result['result']}")
In the trace logs, there are no gen_ai attributes or metrics for the OpenAI call. However, if I replace the OpenAI client with Langchainβs own OpenAI client, a span with LLM metrics is generated as expected.
π Expected behavior
When an OpenAI call is made within a LangGraph node, I expect to see tracing data that includes gen_ai attributes and other metrics associated with the LLM call, such as:
{
"name": "openai.chat",
"context": {
"trace_id": "0x993046216162c1a4dddf7c3484062d26",
"span_id": "0xcfeb41cffa9b1021",
"trace_state": "[]"
},
"kind": "SpanKind.CLIENT",
"parent_id": null,
"start_time": "2024-11-06T14:30:28.540006Z",
"end_time": "2024-11-06T14:30:31.680890Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"llm.request.type": "chat",
"gen_ai.system": "OpenAI",
"gen_ai.request.model": "gpt-4o",
"llm.headers": "None",
"llm.is_streaming": false,
"gen_ai.openai.api_base": "https://api.openai.com/v1/",
"gen_ai.prompt.0.role": "system",
"gen_ai.prompt.0.content": "You are a mathematician.",
"gen_ai.prompt.1.role": "user",
"gen_ai.prompt.1.content": "claculate 5 + 5",
"gen_ai.response.model": "gpt-4o-2024-08-06",
"gen_ai.openai.system_fingerprint": "fp_45cf54deae",
"llm.usage.total_tokens": 39,
"gen_ai.usage.completion_tokens": 15,
"gen_ai.usage.prompt_tokens": 24,
"gen_ai.completion.0.finish_reason": "stop",
"gen_ai.completion.0.role": "assistant",
"gen_ai.completion.0.content": "The result of \\(5 + 5\\) is \\(10\\)."
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "/Users/muhammadkanaan/Desktop/work_repos/agent-analytics/insighter/src/test/openai/openai_llm_call.py"
},
"schema_url": ""
}
}
π Actual Behavior with Screenshots
When running the code from within a LangGraph node, no tracing data is recorded for the OpenAI calls. Below is the actual log output captured during the execution:
{
"name": "__start__.task",
"context": {
"trace_id": "0xc01627149044876058dc3ed2d419a5aa",
"span_id": "0x1b2becef5d6179ab",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0xc77e31c8e27e4710",
"start_time": "2024-11-06T14:22:00.764176Z",
"end_time": "2024-11-06T14:22:00.764384Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"traceloop.association.properties.langgraph_step": 0,
"traceloop.association.properties.langgraph_node": "__start__",
"traceloop.association.properties.langgraph_triggers": [
"__start__"
],
"traceloop.association.properties.langgraph_path": [
"__pregel_pull",
"__start__"
],
"traceloop.association.properties.langgraph_checkpoint_ns": "__start__:91fbf40a-29dd-606a-8186-ca138e2e7803",
"traceloop.workflow.name": "LangGraph",
"traceloop.entity.path": "",
"traceloop.span.kind": "task",
"traceloop.entity.name": "__start__",
"traceloop.entity.input": "{\"inputs\": {\"request\": \"whats 5 + 5\"}, \"tags\": [\"graph:step:0\", \"langsmith:hidden\", \"langsmith:hidden\"], \"metadata\": {\"langgraph_step\": 0, \"langgraph_node\": \"__start__\", \"langgraph_triggers\": [\"__start__\"], \"langgraph_path\": [\"__pregel_pull\", \"__start__\"], \"langgraph_checkpoint_ns\": \"__start__:91fbf40a-29dd-606a-8186-ca138e2e7803\"}, \"kwargs\": {\"name\": \"__start__\"}}",
"traceloop.entity.output": "{\"outputs\": {\"request\": \"whats 5 + 5\"}, \"kwargs\": {\"tags\": [\"graph:step:0\", \"langsmith:hidden\", \"langsmith:hidden\"]}}"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "/Users/muhammadkanaan/Desktop/work_repos/agent-analytics/insighter/src/test/openai/tmp.py"
},
"schema_url": ""
}
}
{
"name": "ChannelWrite<calculate,request,result>.task",
"context": {
"trace_id": "0xc01627149044876058dc3ed2d419a5aa",
"span_id": "0x2e1ca943f99c35fd",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x822c13f276dd7dc7",
"start_time": "2024-11-06T14:22:01.836989Z",
"end_time": "2024-11-06T14:22:01.837304Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"traceloop.association.properties.langgraph_step": 1,
"traceloop.association.properties.langgraph_node": "calculate",
"traceloop.association.properties.langgraph_triggers": [
"start:calculate"
],
"traceloop.association.properties.langgraph_path": [
"__pregel_pull",
"calculate"
],
"traceloop.association.properties.langgraph_checkpoint_ns": "calculate:ba4fa053-c8d2-c4dd-5456-f56c9095719e",
"traceloop.workflow.name": "LangGraph",
"traceloop.entity.path": "calculate",
"traceloop.span.kind": "task",
"traceloop.entity.name": "ChannelWrite<calculate,request,result>",
"traceloop.entity.input": "{\"inputs\": {\"result\": \"\\\\(5 + 5 = 10\\\\).\"}, \"tags\": [\"seq:step:2\", \"langsmith:hidden\"], \"metadata\": {\"langgraph_step\": 1, \"langgraph_node\": \"calculate\", \"langgraph_triggers\": [\"start:calculate\"], \"langgraph_path\": [\"__pregel_pull\", \"calculate\"], \"langgraph_checkpoint_ns\": \"calculate:ba4fa053-c8d2-c4dd-5456-f56c9095719e\"}, \"kwargs\": {\"name\": \"ChannelWrite<calculate,request,result>\"}}",
"traceloop.entity.output": "{\"outputs\": {\"result\": \"\\\\(5 + 5 = 10\\\\).\"}, \"kwargs\": {\"tags\": [\"seq:step:2\", \"langsmith:hidden\"]}}"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "/Users/muhammadkanaan/Desktop/work_repos/agent-analytics/insighter/src/test/openai/tmp.py"
},
"schema_url": ""
}
}
{
"name": "calculate.task",
"context": {
"trace_id": "0xc01627149044876058dc3ed2d419a5aa",
"span_id": "0x822c13f276dd7dc7",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0xc77e31c8e27e4710",
"start_time": "2024-11-06T14:22:00.766167Z",
"end_time": "2024-11-06T14:22:01.837811Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"traceloop.association.properties.langgraph_step": 1,
"traceloop.association.properties.langgraph_node": "calculate",
"traceloop.association.properties.langgraph_triggers": [
"start:calculate"
],
"traceloop.association.properties.langgraph_path": [
"__pregel_pull",
"calculate"
],
"traceloop.association.properties.langgraph_checkpoint_ns": "calculate:ba4fa053-c8d2-c4dd-5456-f56c9095719e",
"traceloop.workflow.name": "LangGraph",
"traceloop.entity.path": "",
"traceloop.span.kind": "task",
"traceloop.entity.name": "calculate",
"traceloop.entity.input": "{\"inputs\": {\"request\": \"whats 5 + 5\"}, \"tags\": [\"graph:step:1\"], \"metadata\": {\"langgraph_step\": 1, \"langgraph_node\": \"calculate\", \"langgraph_triggers\": [\"start:calculate\"], \"langgraph_path\": [\"__pregel_pull\", \"calculate\"], \"langgraph_checkpoint_ns\": \"calculate:ba4fa053-c8d2-c4dd-5456-f56c9095719e\"}, \"kwargs\": {\"name\": \"calculate\"}}",
"traceloop.entity.output": "{\"outputs\": {\"result\": \"\\\\(5 + 5 = 10\\\\).\"}, \"kwargs\": {\"tags\": [\"graph:step:1\"]}}"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "/Users/muhammadkanaan/Desktop/work_repos/agent-analytics/insighter/src/test/openai/tmp.py"
},
"schema_url": ""
}
}
{
"name": "LangGraph.workflow",
"context": {
"trace_id": "0xc01627149044876058dc3ed2d419a5aa",
"span_id": "0xc77e31c8e27e4710",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2024-11-06T14:22:00.763510Z",
"end_time": "2024-11-06T14:22:01.838540Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"traceloop.workflow.name": "LangGraph",
"traceloop.entity.path": "",
"traceloop.span.kind": "workflow",
"traceloop.entity.name": "LangGraph",
"traceloop.entity.input": "{\"inputs\": {\"request\": \"whats 5 + 5\"}, \"tags\": [], \"metadata\": {}, \"kwargs\": {\"name\": \"LangGraph\"}}",
"traceloop.entity.output": "{\"outputs\": {\"request\": \"whats 5 + 5\", \"result\": \"\\\\(5 + 5 = 10\\\\).\"}, \"kwargs\": {\"tags\": []}}"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "/Users/muhammadkanaan/Desktop/work_repos/agent-analytics/insighter/src/test/openai/tmp.py"
},
"schema_url": ""
}
}
As shown, the logs lack the expected gen_ai and llm attributes or metrics related to the OpenAI call, which would normally be included when using Langchain's OpenAI client directly.
π€ Python Version
Python 3.12.4
π Provide any additional context for the Bug.
langchain==0.2.16 langchain-cohere==0.1.9 langchain-community==0.2.17 langchain-core==0.2.41 langchain-experimental==0.0.65 langchain-openai==0.1.25 langchain-text-splitters==0.2.4 langgraph==0.2.23 langgraph-checkpoint==1.0.10
openai==1.47.0
traceloop-sdk==0.33.3
π Have you spent some time to check if this bug has been raised before?
- [X] I checked and didn't find similar issue
Are you willing to submit PR?
Yes I am willing to submit a PR!
Thanks for reporting @jemo21k! You wrote that you're willing to submit a PR - does it mean you have a fix for this? Or should we look into this?
Hi, @nirga Glad to help!. I don't have a fix yet, currently investigating. I think you guys should look into it also.
Hey π I also bumped into the same issue. From a quick look at the code, the LangChain instrumentation suppresses further instruments by setting the SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY key in the context. See the code here.
That key is then used by the instruments to decide if they instrument the call or not. See the code here.
Was there any special reason for disabling downstream instrumentations? π€
EDIT
Enabling the downstream sensors causes OTel to fail at the metric collection while freezing the attributes (see code here) because of unhashable list in the attrs. The attribute has key traceloop.association.properties.ls_stop- what is the purpose of this attribute?
@thisthat the reason we've done that is because these spans are (supposed to be?) already collected by the Langchain callbacks. Before, you would have gotten duplicate OpenAI spans (which results in counting tokens twice for example). We should figure out why the callbacks are not producing the needed spans in this case.
@nirga thanks for the answer. I believe the LangChain sensor should only provide visibility into the pipeline and not trace additional LLM calls. Otherwise, we would need to re-implement every LLM sensors twice, one for their individual calls and one for LangChain.
Is there any news regarding this issue?
We seem to be running into this issue as well. I tried @thisthat's proposed fix and it works, and I didn't notice any duplicate spans or miscalculation in terms of metrics.
Was poking around with this earlier and I found, when applying the proposed fix to the langgraph example in https://github.com/traceloop/openllmetry/blob/main/packages/sample-app/sample_app/langgraph_example.py
it results in two openai chat spans, one called ChatOpenAI.chat from the langchain instrumentation, and a child span called openai.chat from the openai instrumentation.
The child span is most likely not needed and may not be the fix as @nirga may be alluding to. It's possible the main issue here is we're missing ChatOpenAI.chat. I noticed the example app uses a sync call whereas our app (via Litellm) uses async (I double checked the openai.chat is created from the async flow by adding a log line to achat_wrapper in the openai instrumentation and it triggered).
EDIT: Forgot to add - when I used the proposed fix on the sample app, I ran into a bunch of ERROR:opentelemetry.exporter.otlp.proto.common._internal:Failed to encode key gen_ai.response.model: Invalid type <class 'NoneType'> of value None which seems to be happening because the openai instrumentation coalesces gen_ai.response.model to None and lacks coalescing in some other places.
@nirga Could you please grant me permissions on the repo? I have a fix for the propagation and would like to submit a PR for this. I can get the corporate CLA signed as well.
Hey @obs-gh-abhishekrao - you should fork the repo and then you can submit a pull request