fix(langchain): span attrs and metrics missing of langchain third party integration
…ty integration
- [x] I have added tests that cover my changes.
- [ ] If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
- [ ] PR name follows conventional commits format:
feat(instrumentation): ...orfix(instrumentation): .... - [ ] (If applicable) I have updated the documentation accordingly.
This PR mainly fixes two issues related to langchain third party integration, specifically the API ChatDeepSeek:
- Model name detection failing
- Metrics missing like
TTFTand streaming generation time
Repoduce code
import os
import asyncio
from langchain_core.prompts import ChatPromptTemplate
from langchain_deepseek import ChatDeepSeek
# OpenTelemetry setup (HTTP exporter to local collector)
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
from opentelemetry import metrics as otel_metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader
from opentelemetry import trace as otel_trace
prompt_template = """You are a helpful assistant.
Use the following context to answer briefly.
Context:
{context}
Question:
{question}
"""
def init_otel_and_instrument(service_name: str = "langchain-scratch", collector_endpoint: str = "http://127.0.0.1:4318") -> None:
resource = Resource.create({"service.name": service_name})
endpoint = collector_endpoint.rstrip("/") + "/v1/traces"
trace_exporter = OTLPSpanExporter(endpoint=endpoint)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
otel_trace.set_tracer_provider(tracer_provider)
metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter(), export_interval_millis=1000)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
otel_metrics.set_meter_provider(meter_provider)
langchain_instrumentor = LangchainInstrumentor()
langchain_instrumentor.instrument(
tracer_provider=tracer_provider,
meter_provider=meter_provider
)
return langchain_instrumentor
async def main():
instrumentor = init_otel_and_instrument(
service_name="langchain-scratch",
collector_endpoint="http://127.0.0.1:4318"
)
prompt = ChatPromptTemplate.from_template(prompt_template)
api_key = os.getenv("OPENAI_API_KEY", "YOUR_API_KEY")
base_url = os.getenv("OPENAI_BASE_URL", "https://api.deepseek.com/beta")
model_name = os.getenv("MODEL_NAME", "deepseek-reasoner")
model = ChatDeepSeek(
api_base=base_url,
api_key=api_key,
model=model_name,
stream_usage=True,
)
chain = prompt | model
inputs = {"context": "some context", "question": "What's OpenTelemetry?"}
print("Assistant:", end=" ", flush=True)
async for chunk in chain.astream(inputs):
piece = getattr(chunk, "content", "")
if piece:
print(piece, end="", flush=True)
print()
# instrumentor.uninstrument()
if __name__ == "__main__":
asyncio.run(main())
[!IMPORTANT] Fixes model name detection and adds missing metrics for Langchain's
ChatDeepSeekintegration, including TTFT and streaming generation time.
- Behavior:
- Fixes model name detection and adds metrics like
TTFTand streaming generation time forChatDeepSeekincallback_handler.py.- Adds
_create_shared_attributes()inTraceloopCallbackHandlerto create shared attributes for metrics.- Updates
on_llm_new_token()andon_llm_end()to track TTFT and streaming metrics.- Metrics:
- Adds histograms for
TTFTand streaming time, counters for generation choices and exceptions in__init__.py.- Updates
set_request_params()inspan_utils.pyto enhance model extraction.- Tests:
- Adds tests for model extraction and streaming metrics in
test_model_extraction.py,test_streaming_metrics.py, andtest_third_party_models.py.- Verifies metrics recording in
test_langchain_metrics.py.This description was created by
for 15aff50877a66a8eeb78ea08da0d2b6f57356967. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
- New Features
- Added AI streaming metrics: time-to-first-token, streaming time-to-generate, generation choices, and exception counts; integrated into LLM telemetry with richer model/system attributes.
- Bug Fixes
- More reliable model-name detection across requests, responses, metadata and third‑party models; safer lifecycle handling.
- Tests
- New suites for streaming metrics, model extraction, third‑party models and an HTTP cassette for streaming.
- Chores
- Added third‑party model dependency and enabled test recording plugin.
️✅ There are no secrets present in this pull request anymore.
If these secrets were true positive and are still valid, we highly recommend you to revoke them. While these secrets were previously flagged, we no longer have a reference to the specific commits where they were detected. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
[!NOTE]
Other AI code review bot(s) detected
CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.
Walkthrough
Adds GenAI incubating metrics (TTFT, streaming time, choices counter, exception counter) to LangChain instrumentation, wires them through instrumentor and TraceloopCallbackHandler, enhances model-name extraction and span metadata handling, adds DeepSeek dependency and streaming test cassette, and expands tests for streaming/third‑party models and metrics.
Changes
| Cohort / File(s) | Summary |
|---|---|
Instrumentation init & wiringpackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py |
Imports GenAIMetrics; creates TTFT and streaming-time histograms and choices/exception counters; passes them into TraceloopCallbackHandler during instrumentation setup. |
Callback handler metrics & flowpackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py |
Extends TraceloopCallbackHandler.__init__ to accept ttft_histogram, streaming_time_histogram, choices_counter, exception_counter; adds on_llm_new_token; centralizes shared metric attributes; records TTFT, streaming duration, choices, and exception metrics; updates error handling and model-name resolution fallbacks. |
Span/model extraction utilitiespackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py |
Adds first_token_time to SpanHolder; implements unified model extraction functions and fallbacks (_get_unified_unknown_model, _extract_model_name_from_request, _infer_model_from_class_name, _extract_model_name_from_association_metadata); updates set_request_params, set_llm_request, set_chat_request, and response-model extraction to use serialized/metadata inputs. |
Project configurationpackages/opentelemetry-instrumentation-langchain/pyproject.toml |
Adds dependency langchain-deepseek = "^0.1.4" to main and test dependency groups. |
Test configpackages/opentelemetry-instrumentation-langchain/tests/conftest.py |
Enables pytest plugin pytest_recording via pytest_plugins. |
Metrics testspackages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py |
Imports ERROR_TYPE and GenAIMetrics; expands assertions to validate GEN_AI_SERVER_TIME_TO_FIRST_TOKEN, LLM_STREAMING_TIME_TO_GENERATE, choices and exception metrics and related attributes. |
Streaming cassettepackages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_streaming_with_ttft_and_generation_time_metrics.yaml |
Adds DeepSeek streaming interaction cassette with chunked data: lines and final data: [DONE] response. |
Streaming unit testspackages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py |
New test suite TestStreamingMetrics validating TTFT recording on first token, no-repeat TTFT, choices counting, streaming-time, and exception metrics using mocks. |
Third‑party model testspackages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py |
New tests verifying model extraction from serialized kwargs for DeepSeek, fallback deepseek-unknown, and correct metric attribution on on_llm_end. |
Model extraction testspackages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py |
New tests covering multi-path model extraction (kwargs, invocation_params, serialized, metadata), class-name inference, association metadata, and response metadata parsing. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant App
participant LangChain
participant Callback as TraceloopCallbackHandler
participant Span as Span/SpanHolder
participant OTel as OTel Metrics
App->>LangChain: invoke LLM/chat (streaming)
LangChain->>Callback: on_chat_model_start / on_llm_start(kwargs, serialized, metadata)
Callback->>Span: set_request_params(span_holder, kwargs, serialized, metadata)
Note right of Callback: resolve model_name via kwargs/serialized/metadata/class-name
loop stream tokens
LangChain->>Callback: on_llm_new_token(token, run_id)
alt first token
Callback->>Span: record first_token_time
Callback->>OTel: ttft_histogram.record(value, attributes)
else subsequent token
Callback-->>OTel: (no TTFT)
end
end
LangChain->>Callback: on_llm_end(result, run_id)
Callback->>OTel: duration_histogram.record(...)
Callback->>OTel: token_histogram.record(...)
opt streaming
Callback->>OTel: streaming_time_histogram.record(...)
Callback->>OTel: choices_counter.add(n, attributes)
end
Callback->>Span: end span & cleanup
sequenceDiagram
autonumber
participant LangChain
participant Callback
participant Span
participant OTel as OTel Metrics
LangChain->>Callback: on_llm_error(error, run_id)
Callback->>Span: resolve model_name & error type
Callback->>OTel: exception_counter.add(1, attributes: error.type, model, system, server.address)
Callback-->>LangChain: propagate/return
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
- traceloop/openllmetry#3237 — Overlaps model-name extraction and association-metadata fallbacks used in span_utils.py and callback_handler.py.
- traceloop/openllmetry#3206 — Modifies TraceloopCallbackHandler lifecycle and constructor patterns similar to this PR’s constructor/flow changes.
- traceloop/openllmetry#3216 — Touches on_llm_end and span-ending logic that intersect with this PR’s reporting and cleanup adjustments.
Suggested reviewers
- nirga
- doronkopit5
Poem
A nibble of tokens, the first one sweet,
My whiskers twitch—TTFT’s complete!
I count the choices as streams flow by,
If errors hop up, I mark them high.
DeepSeek dreams in metrics neat. 🐇✨
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title Check | ✅ Passed | The title accurately and concisely describes the primary change—fixing missing span attributes and metrics for the LangChain third‑party integration—which matches the modifications to span_utils.py, callback_handler.py, and the added metrics/tests; it uses a conventional commit prefix and avoids unnecessary detail. The wording is slightly awkward grammatically but remains specific and on-topic for a reviewer scanning PR history. Overall the title communicates the main intent of the changeset. |
✨ Finishing touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
[!TIP]
👮 Agentic pre-merge checks are now available in preview!
Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
- Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
- Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.
Please see the documentation for more information.
Example:
reviews: pre_merge_checks: custom_checks: - name: "Undocumented Breaking Changes" mode: "warning" instructions: | Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
@nirga Please review this PR related to the LangChain instrumentation bugs.