librdkafka icon indicating copy to clipboard operation
librdkafka copied to clipboard

Consumer initialization hangs

Open Mrigank11 opened this issue 1 year ago • 1 comments

Description

We have ulimit nproc set for all users, which disallows creation of threads after a certain limit has reached. In such cases, confluent-kafka seems to hang when creating a new consumer.

How to reproduce

  • Create a python script, test.py with the following contents:

    from confluent_kafka import Consumer
    
    bootstrap_servers = "0.0.0.0:9092"
    consumer_config = {
        'bootstrap.servers': bootstrap_servers,
        'group.id': 'NONE'
    }
    
    print("creating consumer")
    consumer = Consumer(consumer_config)
    print("consumer created")
    

    I'm using kafka.Consumer() from confluent-kafka-python to initialize the consumer (see the stack-trace below indicating the exact C-level method called).

  • Set the ulimit to a lower value, and run the script:

    ash -c 'N=$(ps -Tu $USER --no-header | wc -l);ulimit -u $((N-2));python test.py'
    

Observations

  • Process is stuck at kafka.Consumer(). Here's the back-trace from gdb:
gdb -p 1330247
[truncated]
(gdb) info threads
  Id   Target Id                  Frame
* 1    LWP 1330247 "python"       0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
  2    LWP 1330251 "rdk:broker-1" 0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
#0  0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#1  0x00007f1f0ed28b23 in __pthread_clockjoin_ex () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#2  0x00007f1f0ed2faa4 in thrd_join@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#3  0x00007f1f0105793f in rd_kafka_destroy_internal () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#4  0x00007f1f0105993d in rd_kafka_new () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#5  0x00007f1f0127e008 in Consumer_init () from /nix/store/vr0y9jrjzxmdl8j8c7i2vqq3x0zaza8p-python3.11-confluent-kafka-2.3.0/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so
#6  0x00007f1f0f13baa7 in type_call () from /nix/store/25nrdsg4lfzmvkwicm9186xadpff113f-python3-3.11.6/lib/libpython3.11.so.1.0
(More stack frames follow...)
(gdb) thread 2
[Switching to thread 2 (LWP 1330251)]
#0  0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
(gdb) bt 7
#0  0x00007f1f0ed23c96 in __futex_abstimed_wait_common () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#1  0x00007f1f0ed2676c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#2  0x00007f1f0ed2f69d in cnd_timedwait@GLIBC_2.28 () from /nix/store/wrnlihsx7xq5pladg7yibzsi07jyi3vk-glibc-2.38-27/lib/libc.so.6
#3  0x00007f1f01087021 in rd_kafka_q_pop_serve[localalias] () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#4  0x00007f1f0106a8f8 in rd_kafka_broker_ops_io_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#5  0x00007f1f0106af39 in rd_kafka_broker_consumer_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
#6  0x00007f1f0106b749 in rd_kafka_broker_serve () from /nix/store/72kq9hkfz8nfjhwv2k1cd4nhwzrkxvvj-rdkafka-2.3.0/lib/librdkafka.so.1
(More stack frames follow...)

Looking at the stack trace, this seems very similar to #3954.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • [x] librdkafka version (release number or git tag): v2.6.0
  • [x] Apache Kafka version: 3.0.0
  • [x] librdkafka client configuration: auto.offset.reset=earliest, enable.auto.commit=false, debug=all
  • [x] Operating system: Red Hat Enterprise Linux 8.9
  • [x] Provide logs (with debug=.. as necessary) from librdkafka
  • [ ] Provide broker log excerpts
  • [x] Critical issue

Mrigank11 avatar Jan 22 '24 12:01 Mrigank11

Any update here?

Mrigank11 avatar Apr 19 '24 14:04 Mrigank11