confluent-kafka-python icon indicating copy to clipboard operation
confluent-kafka-python copied to clipboard

Version 2.1.0 segfaults when subscribing to non-existent topic

Open ffissore opened this issue 2 years ago • 14 comments

Description

Since confluent-kafka 2.1.0, subscribing to a non-existent topic causes python to segfault.

How to reproduce

Using the files in this gist, run docker compose, then run script test.py

  1. at first, with no modifications
  2. then, with the admin client part commented out: in short, don't create the topic

The first run will be successful. The second run will log Segmentation fault (core dumped)

With confluent-kafka 2.0.2, poll returns a message with value b'Subscribed topic not available: test-cf67de5e-e79e-48f7-9300-763fb5b8bc05: Broker: Unknown topic or partition'

Checklist

Please provide the following information:

  • [x] confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()): ('2.1.0', 33619968) ('2.1.0', 33620223)
  • [x] Apache Kafka broker version: 2.1.1 (confluent 5.1.2)
  • [x] Client configuration: {...}: in the gist
  • [x] Operating system: ubuntu 20.04
  • [ ] Provide client logs (with 'debug': '..' as necessary)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue

ffissore avatar Apr 06 '23 12:04 ffissore

Using gdb I got this far in debugging (not much actually)

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff62f7115 in rd_kafka_message_leader_epoch () from /home/federico/.pyenv/versions/3.11.2-debug/envs/test-env/lib/python3.11/site-packages/confluent_kafka/../confluent_kafka.libs/librdkafka-e24e6ccd.so.1

ffissore avatar Apr 06 '23 12:04 ffissore

Hello Federico, thanks. I'm aware of the issue. Have done a fix in the .NET client but I'm going to do it for librdkafka too.

emasab avatar Apr 06 '23 15:04 emasab

Same issue here, upgrading to 2.1.0 from 1.6.0. Any update on the fix? Thank you

Richetto avatar Apr 23 '23 13:04 Richetto

Yes, we've merged the fix here and are planning a maintenance release soon.

emasab avatar Apr 23 '23 21:04 emasab

This is still broken in confluent-kafka-python 2.1.1.

lpsinger avatar May 04 '23 18:05 lpsinger

@lpsinger are you sure? I've just tried with a non existing topic and it gives

Traceback (most recent call last):
  File "consumer.py", line 98, in <module>
    raise KafkaException(msg.error())
cimpl.KafkaException: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str="Subscribed topic not available: this_topic_doesnt_exist: Broker: Unknown topic or partition"}

emasab avatar May 04 '23 19:05 emasab

We have a lightweight client library that wraps confluent-kaka-python and adds the configuration presets for our Kafka cluster: https://github.com/nasa-gcn/gcn-kafka-python

It's segfaulting with unknown topics. So it might be something with confluent-kafka-python 2.1.1 plus unknown topics plus OpenID Connect.

lpsinger avatar May 04 '23 19:05 lpsinger

To test, go to https://gcn.nasa.gov, click "Start streaming GCN Notices", and follow the instructions.

lpsinger avatar May 04 '23 19:05 lpsinger

@lpsinger Thanks for that, I've installed and reproduced. The error happens in a different place than the initial one for a fix we did to the consume batch in 2.1.0. It happens with the Consumer.consume method but not with Consumer.poll, with non-existent topics.

emasab avatar May 05 '23 07:05 emasab

I confirm that our code is working with version 2.1.1: our tests are green As I'm not sure if you prefer another issue to track the bug with Consumer.consume, I'll leave closing this issue to you.

ffissore avatar May 05 '23 09:05 ffissore

When I run the test suite on version 2.1.1, one of the tests seems to segfault, see gist. The same happens, when I upgrade both librdkafka and this package to 2.2.0.

It seems like the same issue.

milibopp avatar Aug 02 '23 09:08 milibopp

@milibopp It doesn't seem the same issue, in that test test_oauth_cb_principal_sasl_extensions it's not subscribing to any topics. I couldn't reproduce it by running that test, could your provide some hint to reproduce it?

emasab avatar Aug 02 '23 18:08 emasab

I am facing the similar issue in the current package version v2.2.0 in python 3.10.

ksajan avatar Sep 07 '23 12:09 ksajan

I still see similar issue even in version >= 2.0.2 when using SSL, more details in ticket: https://github.com/confluentinc/confluent-kafka-python/issues/1690

Vikash08Mishra avatar Dec 12 '23 12:12 Vikash08Mishra