librdkafka
librdkafka copied to clipboard
Consumer initialization hangs
Description
We have ulimit nproc set for all users, which disallows creation of threads after a certain limit has reached. In such cases, confluent-kafka seems to hang when creating a new consumer.
How to reproduce
-
Create a python script,
test.pywith the following contents:from confluent_kafka import Consumer bootstrap_servers = "0.0.0.0:9092" consumer_config = { 'bootstrap.servers': bootstrap_servers, 'group.id': 'NONE' } print("creating consumer") consumer = Consumer(consumer_config) print("consumer created")I'm using
kafka.Consumer()fromconfluent-kafka-pythonto initialize the consumer (see the stack-trace below indicating the exact C-level method called). -
Set the ulimit to a lower value, and run the script:
ash -c 'N=$(ps -Tu $USER --no-header | wc -l);ulimit -u $((N-2));python test.py'
Observations
- Process is stuck at
kafka.Consumer(). Here's the back-trace from gdb:
gdb -p 1330247
[truncated]
(gdb) info threads
Id Target Id Frame
* 1 LWP 1330247 "python" 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
2 LWP 1330251 "rdk:broker-1" 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
#0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#1 0x00007f1f0ed28b23 in __pthread_clockjoin_ex () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#2 0x00007f1f0ed2faa4 in thrd_join@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#3 0x00007f1f0105793f in rd_kafka_destroy_internal () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#4 0x00007f1f0105993d in rd_kafka_new () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#5 0x00007f1f0127e008 in Consumer_init () from /nix/store/vr0y9jrjzxmdl8j8c7i2vqq3x0zaza8p-python3.11-confluent-kafka-2.3.0/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
#6 0x00007f1f0f13baa7 in type_call () from /nix/store/25nrdsg4lfzmvkwicm9186xadpff113f-python3-3.11.6/lib/libpython3.11.so.1.0
(More stack frames follow...)
(gdb) thread 2
[Switching to thread 2 (LWP 1330251)]
#0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
#0 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#1 0x00007f1f0ed2676c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#2 0x00007f1f0ed2f69d in cnd_timedwait@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#3 0x00007f1f01087021 in rd_kafka_q_pop_serve[localalias] () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#4 0x00007f1f0106a8f8 in rd_kafka_broker_ops_io_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#5 0x00007f1f0106af39 in rd_kafka_broker_consumer_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#6 0x00007f1f0106b749 in rd_kafka_broker_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
(More stack frames follow...)
Looking at the stack trace, this seems very similar to #3954.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
- [x] librdkafka version (release number or git tag):
v2.6.0 - [x] Apache Kafka version:
3.0.0 - [x] librdkafka client configuration:
auto.offset.reset=earliest, enable.auto.commit=false, debug=all - [x] Operating system:
Red Hat Enterprise Linux 8.9 - [x] Provide logs (with
debug=..as necessary) from librdkafka - [ ] Provide broker log excerpts
- [x] Critical issue
Any update here?