dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

[BUG]: Kafka instrumentation causes indefinite blocking due to list_topics() call without timeout

Open loftiskg opened this issue 5 months ago • 0 comments

Tracer Version(s)

2.21.6

Python Version(s)

3.13.2

Pip Version(s)

pip 25.1.1

Bug Report

Summary

The ddtrace Kafka instrumentation in ddtrace.contrib.internal.kafka.patch._get_cluster_id() calls list_topics() without a timeout parameter, which can cause the main thread to block indefinitely when the Kafka cluster becomes unresponsive.

Environment

  • ddtrace version: 2.21.6
  • Python version: 3.13.2
  • Kafka client: confluent-kafka-python
  • Installation method: pip

Expected Behavior

Kafka instrumentation should not block the main event loop indefinitely. Operations should have reasonable timeouts to prevent application hangs.

Actual Behavior

When the Kafka cluster becomes unresponsive, the _get_cluster_id() function blocks indefinitely on the instance.list_topics(topic=topic) call at line 332 in /ddtrace/contrib/internal/kafka/patch.py.

This occurs during every produce() operation when ddtrace tries to collect the cluster ID for tracing metadata.

Stack Trace

# Application producer code
producer.produce(topic=topic, value=value, headers=headers, key=key)
# ddtrace/contrib/internal/kafka/patch.py:174 (traced_produce)
cluster_id = _get_cluster_id(instance, topic)
# ddtrace/contrib/internal/kafka/patch.py:332 (_get_cluster_id)
cluster_metadata = instance.list_topics(topic=topic)  # <- BLOCKS HERE

Reproduction Steps

  1. Set up a Kafka producer with ddtrace instrumentation enabled
  2. Make the Kafka cluster unresponsive (stop broker, introduce network issues, or misconfigure connection)
  3. Attempt to produce a message using the instrumented producer
  4. Application hangs indefinitely on the list_topics() call

loftiskg avatar Jun 19 '25 00:06 loftiskg