openllmetry icon indicating copy to clipboard operation
openllmetry copied to clipboard

🐛 Bug Report: [Critical Bug] OpenAI streaming instrumentation crashes production with "unhashable type: 'list'" when using tool calls

Open zwanzigg opened this issue 2 months ago • 0 comments

Which component is this bug for?

OpenAI Instrumentation

📜 Description

Severity: CRITICAL

The OpenAI instrumentation causes production crashes when using streaming chat completions with tool definitions. The error occurs during metric recording with TypeError: unhashable type: 'list', causing the entire request to fail with a 500 error.

Environment

  • traceloop-sdk version: 0.47.4, 0.47.5 (bug present in both)
  • Python version: 3.13
  • OpenAI SDK version: 1.66.5
  • OpenTelemetry SDK version: 1.38.0
  • Framework: LangGraph with direct OpenAI client calls

Root Cause Analysis

  1. OpenAI tool definitions contain lists/arrays (e.g., "required": ["param1", "param2"])
  2. Instrumentation captures these in _shared_attributes() for metric recording
  3. OpenTelemetry metrics require hashable attributes (for aggregation keys via frozenset())
  4. Lists are not hashableTypeError crashes the streaming iterator
  5. Error propagates to user code → entire request fails with 500

Why This is Critical

Production Impact

  • ✅ User's code is 100% correct
  • ❌ Observability library crashes production
  • ❌ No way to catch the error (happens in sync iterator)
  • ❌ Results in 500 errors for end users

Failed Workarounds

  • should_enrich_metrics=False - Still crashes
  • span_postprocess_callback - Too late, error happens during streaming
  • block_instruments={Instruments.OPENAI} - Only working solution (loses all OpenAI telemetry)

Proposed Fix

In opentelemetry/instrumentation/openai/shared/chat_wrappers.py, sanitize attributes before metric recording:

def _shared_attributes(self):
    """Get attributes for metrics - sanitize unhashable types."""
    attrs = {
        # ... existing attributes
    }
    
    # Sanitize for metric recording
    sanitized = {}
    for key, value in attrs.items():
        if isinstance(value, (list, dict)):
            # Convert to JSON string for hashability
            try:
                sanitized[key] = json.dumps(value)
            except (TypeError, ValueError):
                sanitized[key] = str(value)
        else:
            sanitized[key] = value
    
    return sanitized

OR wrap metric recording in try/except:

def _process_item(self, chunk):
    try:
        self._streaming_time_to_first_token.record(
            self._time_of_first_token - self._start_time,
            attributes=self._shared_attributes(),
        )
    except (TypeError, ValueError) as e:
        # Log but don't crash user code
        logger.warning(f"Failed to record metric: {e}")

This bug makes Traceloop unusable in production for any agent using OpenAI tools.

👟 Reproduction steps

Minimal Reproduction

from traceloop.sdk import Traceloop
from openai import OpenAI

# Initialize Traceloop (any configuration)
Traceloop.init(
    app_name="test-app",
    should_enrich_metrics=False,  # Even with this disabled, still crashes!
)

# Setup OpenAI client
client = OpenAI(api_key="your-api-key")

# Make streaming call with tools - THIS CRASHES
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]  # ← LIST causes crash
                }
            }
        }
    ],
    tool_choice="auto",
    stream=True,  # Only crashes with streaming
)

# Crash happens during iteration
for chunk in response:  # TypeError on first chunk with tool_calls
    if chunk.choices:
        print(chunk.choices[0].delta)

👍 Expected behavior

Instrumentation should:

  1. Never crash user code - fail gracefully or skip problematic metrics
  2. Sanitize attributes before recording - convert unhashable types to strings
  3. Handle errors defensively - log warning and continue

👎 Actual Behavior with Screenshots

Complete Stack Trace

File "my_app.py", line 25, in main
    for chunk in response:
                 ^^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 693, in __next__
    self._process_item(chunk)
    ~~~~~~~~~~~~~~~~~~^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 718, in _process_item
    self._streaming_time_to_first_token.record(
        self._time_of_first_token - self._start_time,
        attributes=self._shared_attributes(),  # ← Problem: contains unhashable lists
    )
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 428, in record
    self._real_instrument.record(amount, attributes, context)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 264, in record
    self._measurement_consumer.consume_measurement(...)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py", line 105, in consume_measurement
    aggr_key = frozenset(attributes.items())  # ← Crash: can't hash lists
TypeError: unhashable type: 'list'

🤖 Python Version

3.13

📃 Provide any additional context for the Bug.

  • Non-streaming calls work fine (different code path)
  • Calls without tools work fine
  • Error occurs even with should_enrich_metrics=False
  • Only solution is block_instruments={Instruments.OPENAI}

👀 Have you spent some time to check if this bug has been raised before?

  • [x] I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

zwanzigg avatar Oct 29 '25 15:10 zwanzigg