openllmetry icon indicating copy to clipboard operation
openllmetry copied to clipboard

fix(langchain): span attrs and metrics missing of langchain third party integration

Open minimAluminiumalism opened this issue 3 months ago • 3 comments

…ty integration

  • [x] I have added tests that cover my changes.
  • [ ] If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • [ ] PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • [ ] (If applicable) I have updated the documentation accordingly.

This PR mainly fixes two issues related to langchain third party integration, specifically the API ChatDeepSeek:

  • Model name detection failing
  • Metrics missing like TTFT and streaming generation time

Repoduce code

import os
import asyncio
from langchain_core.prompts import ChatPromptTemplate

from langchain_deepseek import ChatDeepSeek

# OpenTelemetry setup (HTTP exporter to local collector)
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

from opentelemetry.instrumentation.langchain import LangchainInstrumentor
from opentelemetry import metrics as otel_metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader
from opentelemetry import trace as otel_trace

prompt_template = """You are a helpful assistant.
Use the following context to answer briefly.

Context:
{context}

Question:
{question}
"""

def init_otel_and_instrument(service_name: str = "langchain-scratch", collector_endpoint: str = "http://127.0.0.1:4318") -> None:    
    resource = Resource.create({"service.name": service_name})
    
    endpoint = collector_endpoint.rstrip("/") + "/v1/traces"
    trace_exporter = OTLPSpanExporter(endpoint=endpoint)
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
    otel_trace.set_tracer_provider(tracer_provider)

    metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter(), export_interval_millis=1000)
    meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
    otel_metrics.set_meter_provider(meter_provider)

    langchain_instrumentor = LangchainInstrumentor()
    langchain_instrumentor.instrument(
        tracer_provider=tracer_provider,
        meter_provider=meter_provider
    )
    
    return langchain_instrumentor

async def main():
    instrumentor = init_otel_and_instrument(
        service_name="langchain-scratch", 
        collector_endpoint="http://127.0.0.1:4318"
    )
    
    prompt = ChatPromptTemplate.from_template(prompt_template)

    api_key = os.getenv("OPENAI_API_KEY", "YOUR_API_KEY")
    base_url = os.getenv("OPENAI_BASE_URL", "https://api.deepseek.com/beta")
    model_name = os.getenv("MODEL_NAME", "deepseek-reasoner")

    model = ChatDeepSeek(
        api_base=base_url,
        api_key=api_key,
        model=model_name,
        stream_usage=True,
    )

    chain = prompt | model

    inputs = {"context": "some context", "question": "What's OpenTelemetry?"}
    print("Assistant:", end=" ", flush=True)
    async for chunk in chain.astream(inputs):
        piece = getattr(chunk, "content", "")
        if piece:
            print(piece, end="", flush=True)
    print()
    
    # instrumentor.uninstrument()

if __name__ == "__main__":
    asyncio.run(main())

Clipboard_Screenshot_1758409020

[!IMPORTANT] Fixes model name detection and adds missing metrics for Langchain's ChatDeepSeek integration, including TTFT and streaming generation time.

  • Behavior:
    • Fixes model name detection and adds metrics like TTFT and streaming generation time for ChatDeepSeek in callback_handler.py.
    • Adds _create_shared_attributes() in TraceloopCallbackHandler to create shared attributes for metrics.
    • Updates on_llm_new_token() and on_llm_end() to track TTFT and streaming metrics.
  • Metrics:
    • Adds histograms for TTFT and streaming time, counters for generation choices and exceptions in __init__.py.
    • Updates set_request_params() in span_utils.py to enhance model extraction.
  • Tests:
    • Adds tests for model extraction and streaming metrics in test_model_extraction.py, test_streaming_metrics.py, and test_third_party_models.py.
    • Verifies metrics recording in test_langchain_metrics.py.

This description was created by Ellipsis for 15aff50877a66a8eeb78ea08da0d2b6f57356967. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features
    • Added AI streaming metrics: time-to-first-token, streaming time-to-generate, generation choices, and exception counts; integrated into LLM telemetry with richer model/system attributes.
  • Bug Fixes
    • More reliable model-name detection across requests, responses, metadata and third‑party models; safer lifecycle handling.
  • Tests
    • New suites for streaming metrics, model extraction, third‑party models and an HTTP cassette for streaming.
  • Chores
    • Added third‑party model dependency and enabled test recording plugin.

minimAluminiumalism avatar Sep 20 '25 22:09 minimAluminiumalism

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them. While these secrets were previously flagged, we no longer have a reference to the specific commits where they were detected. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

gitguardian[bot] avatar Sep 21 '25 00:09 gitguardian[bot]

[!NOTE]

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds GenAI incubating metrics (TTFT, streaming time, choices counter, exception counter) to LangChain instrumentation, wires them through instrumentor and TraceloopCallbackHandler, enhances model-name extraction and span metadata handling, adds DeepSeek dependency and streaming test cassette, and expands tests for streaming/third‑party models and metrics.

Changes

Cohort / File(s) Summary
Instrumentation init & wiring
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py
Imports GenAIMetrics; creates TTFT and streaming-time histograms and choices/exception counters; passes them into TraceloopCallbackHandler during instrumentation setup.
Callback handler metrics & flow
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Extends TraceloopCallbackHandler.__init__ to accept ttft_histogram, streaming_time_histogram, choices_counter, exception_counter; adds on_llm_new_token; centralizes shared metric attributes; records TTFT, streaming duration, choices, and exception metrics; updates error handling and model-name resolution fallbacks.
Span/model extraction utilities
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
Adds first_token_time to SpanHolder; implements unified model extraction functions and fallbacks (_get_unified_unknown_model, _extract_model_name_from_request, _infer_model_from_class_name, _extract_model_name_from_association_metadata); updates set_request_params, set_llm_request, set_chat_request, and response-model extraction to use serialized/metadata inputs.
Project configuration
packages/opentelemetry-instrumentation-langchain/pyproject.toml
Adds dependency langchain-deepseek = "^0.1.4" to main and test dependency groups.
Test config
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
Enables pytest plugin pytest_recording via pytest_plugins.
Metrics tests
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
Imports ERROR_TYPE and GenAIMetrics; expands assertions to validate GEN_AI_SERVER_TIME_TO_FIRST_TOKEN, LLM_STREAMING_TIME_TO_GENERATE, choices and exception metrics and related attributes.
Streaming cassette
packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_streaming_with_ttft_and_generation_time_metrics.yaml
Adds DeepSeek streaming interaction cassette with chunked data: lines and final data: [DONE] response.
Streaming unit tests
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
New test suite TestStreamingMetrics validating TTFT recording on first token, no-repeat TTFT, choices counting, streaming-time, and exception metrics using mocks.
Third‑party model tests
packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py
New tests verifying model extraction from serialized kwargs for DeepSeek, fallback deepseek-unknown, and correct metric attribution on on_llm_end.
Model extraction tests
packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
New tests covering multi-path model extraction (kwargs, invocation_params, serialized, metadata), class-name inference, association metadata, and response metadata parsing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant LangChain
  participant Callback as TraceloopCallbackHandler
  participant Span as Span/SpanHolder
  participant OTel as OTel Metrics

  App->>LangChain: invoke LLM/chat (streaming)
  LangChain->>Callback: on_chat_model_start / on_llm_start(kwargs, serialized, metadata)
  Callback->>Span: set_request_params(span_holder, kwargs, serialized, metadata)
  Note right of Callback: resolve model_name via kwargs/serialized/metadata/class-name

  loop stream tokens
    LangChain->>Callback: on_llm_new_token(token, run_id)
    alt first token
      Callback->>Span: record first_token_time
      Callback->>OTel: ttft_histogram.record(value, attributes)
    else subsequent token
      Callback-->>OTel: (no TTFT)
    end
  end

  LangChain->>Callback: on_llm_end(result, run_id)
  Callback->>OTel: duration_histogram.record(...)
  Callback->>OTel: token_histogram.record(...)
  opt streaming
    Callback->>OTel: streaming_time_histogram.record(...)
    Callback->>OTel: choices_counter.add(n, attributes)
  end
  Callback->>Span: end span & cleanup
sequenceDiagram
  autonumber
  participant LangChain
  participant Callback
  participant Span
  participant OTel as OTel Metrics

  LangChain->>Callback: on_llm_error(error, run_id)
  Callback->>Span: resolve model_name & error type
  Callback->>OTel: exception_counter.add(1, attributes: error.type, model, system, server.address)
  Callback-->>LangChain: propagate/return

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • traceloop/openllmetry#3237 — Overlaps model-name extraction and association-metadata fallbacks used in span_utils.py and callback_handler.py.
  • traceloop/openllmetry#3206 — Modifies TraceloopCallbackHandler lifecycle and constructor patterns similar to this PR’s constructor/flow changes.
  • traceloop/openllmetry#3216 — Touches on_llm_end and span-ending logic that intersect with this PR’s reporting and cleanup adjustments.

Suggested reviewers

  • nirga
  • doronkopit5

Poem

A nibble of tokens, the first one sweet,
My whiskers twitch—TTFT’s complete!
I count the choices as streams flow by,
If errors hop up, I mark them high.
DeepSeek dreams in metrics neat. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and concisely describes the primary change—fixing missing span attributes and metrics for the LangChain third‑party integration—which matches the modifications to span_utils.py, callback_handler.py, and the added metrics/tests; it uses a conventional commit prefix and avoids unnecessary detail. The wording is slightly awkward grammatically but remains specific and on-topic for a reviewer scanning PR history. Overall the title communicates the main intent of the changeset.
✨ Finishing touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

[!TIP]

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 21 '25 00:09 coderabbitai[bot]

@nirga Please review this PR related to the LangChain instrumentation bugs.

minimAluminiumalism avatar Sep 23 '25 04:09 minimAluminiumalism