🐛 Bug Report: [Critical Bug] OpenAI streaming instrumentation crashes production with "unhashable type: 'list'" when using tool calls

Open zwanzigg opened this issue 2 months ago • 0 comments

Which component is this bug for?

OpenAI Instrumentation

📜 Description

Severity: CRITICAL

The OpenAI instrumentation causes production crashes when using streaming chat completions with tool definitions. The error occurs during metric recording with TypeError: unhashable type: 'list', causing the entire request to fail with a 500 error.

Environment

traceloop-sdk version: 0.47.4, 0.47.5 (bug present in both)
Python version: 3.13
OpenAI SDK version: 1.66.5
OpenTelemetry SDK version: 1.38.0
Framework: LangGraph with direct OpenAI client calls

Root Cause Analysis

OpenAI tool definitions contain lists/arrays (e.g., "required": ["param1", "param2"])
Instrumentation captures these in _shared_attributes() for metric recording
OpenTelemetry metrics require hashable attributes (for aggregation keys via frozenset())
Lists are not hashable → TypeError crashes the streaming iterator
Error propagates to user code → entire request fails with 500

Why This is Critical

Production Impact

✅ User's code is 100% correct
❌ Observability library crashes production
❌ No way to catch the error (happens in sync iterator)
❌ Results in 500 errors for end users

Failed Workarounds

❌ should_enrich_metrics=False - Still crashes
❌ span_postprocess_callback - Too late, error happens during streaming
✅ block_instruments={Instruments.OPENAI} - Only working solution (loses all OpenAI telemetry)

Proposed Fix

In opentelemetry/instrumentation/openai/shared/chat_wrappers.py, sanitize attributes before metric recording:

def _shared_attributes(self):
    """Get attributes for metrics - sanitize unhashable types."""
    attrs = {
        # ... existing attributes
    }
    
    # Sanitize for metric recording
    sanitized = {}
    for key, value in attrs.items():
        if isinstance(value, (list, dict)):
            # Convert to JSON string for hashability
            try:
                sanitized[key] = json.dumps(value)
            except (TypeError, ValueError):
                sanitized[key] = str(value)
        else:
            sanitized[key] = value
    
    return sanitized

OR wrap metric recording in try/except:

def _process_item(self, chunk):
    try:
        self._streaming_time_to_first_token.record(
            self._time_of_first_token - self._start_time,
            attributes=self._shared_attributes(),
        )
    except (TypeError, ValueError) as e:
        # Log but don't crash user code
        logger.warning(f"Failed to record metric: {e}")

This bug makes Traceloop unusable in production for any agent using OpenAI tools.

👟 Reproduction steps

Minimal Reproduction

from traceloop.sdk import Traceloop
from openai import OpenAI

# Initialize Traceloop (any configuration)
Traceloop.init(
    app_name="test-app",
    should_enrich_metrics=False,  # Even with this disabled, still crashes!
)

# Setup OpenAI client
client = OpenAI(api_key="your-api-key")

# Make streaming call with tools - THIS CRASHES
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]  # ← LIST causes crash
                }
            }
        }
    ],
    tool_choice="auto",
    stream=True,  # Only crashes with streaming
)

# Crash happens during iteration
for chunk in response:  # TypeError on first chunk with tool_calls
    if chunk.choices:
        print(chunk.choices[0].delta)

👍 Expected behavior

Instrumentation should:

Never crash user code - fail gracefully or skip problematic metrics
Sanitize attributes before recording - convert unhashable types to strings
Handle errors defensively - log warning and continue

👎 Actual Behavior with Screenshots

Complete Stack Trace

File "my_app.py", line 25, in main
    for chunk in response:
                 ^^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 693, in __next__
    self._process_item(chunk)
    ~~~~~~~~~~~~~~~~~~^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 718, in _process_item
    self._streaming_time_to_first_token.record(
        self._time_of_first_token - self._start_time,
        attributes=self._shared_attributes(),  # ← Problem: contains unhashable lists
    )
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 428, in record
    self._real_instrument.record(amount, attributes, context)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 264, in record
    self._measurement_consumer.consume_measurement(...)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py", line 105, in consume_measurement
    aggr_key = frozenset(attributes.items())  # ← Crash: can't hash lists
TypeError: unhashable type: 'list'

🤖 Python Version

3.13

📃 Provide any additional context for the Bug.

Non-streaming calls work fine (different code path)
Calls without tools work fine
Error occurs even with should_enrich_metrics=False
Only solution is block_instruments={Instruments.OPENAI}

👀 Have you spent some time to check if this bug has been raised before?

[x] I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

Oct 29 '25 15:10 zwanzigg