confluent-kafka-go icon indicating copy to clipboard operation
confluent-kafka-go copied to clipboard

Segmentation fault when creating a producer after a consumer

Open somagh opened this issue 2 years ago • 3 comments

Description

When i create a producer after creating a consumer i get segmentation fault crash. This happens only when i config bootstrap servers as hostnames. If i use IP everything works normally. This happens only when i use static linking.

How to reproduce

source code:

package main

import (
	"github.com/confluentinc/confluent-kafka-go/kafka"
)

func main() {
	consumer, err := kafka.NewConsumer(&kafka.ConfigMap{
		"bootstrap.servers":  "example1.com:9092",
		"group.id":           "test",
		"auto.offset.reset":  "earliest",
		"enable.auto.commit": true,
		"debug" : "all",
	})
	if err != nil {
		panic(err)
	}

	println("before creating producer")
	producer, err := kafka.NewProducer(&kafka.ConfigMap{
		"bootstrap.servers": "example2.com:9092",
		"acks":              "all",
		"debug" : "all",
	})
	println("after creating producer")
	println(producer.String()) // in order to prevent not used error
	println(consumer.String()) // in order to prevent not used error
}

build command: go build -a -ldflags "-linkmode external -extldflags -static" main.go

output:

%7|1651503922.240|MEMBERID|rdkafka#consumer-1| [thrd:app]: Group "test": updating member id "(not-set)" -> ""
%7|1651503922.240|WAKEUPFD|rdkafka#consumer-1| [thrd:app]: GroupCoordinator: Enabled low-latency ops queue wake-ups
%7|1651503922.240|BROKER|rdkafka#consumer-1| [thrd:app]: GroupCoordinator: Added new broker with NodeId -1
%7|1651503922.240|BRKMAIN|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Enter main broker thread
%7|1651503922.240|WAKEUPFD|rdkafka#consumer-1| [thrd:app]: example1.com:9092/bootstrap: Enabled low-latency ops queue wake-ups
%7|1651503922.240|BRKMAIN|rdkafka#consumer-1| [thrd::0/internal]: :0/internal: Enter main broker thread
%7|1651503922.240|BROKER|rdkafka#consumer-1| [thrd:app]: example1.com:9092/bootstrap: Added new broker with NodeId -1
%7|1651503922.240|CGRPSTATE|rdkafka#consumer-1| [thrd:main]: Group "test" changed state init -> query-coord (join-state init)
%7|1651503922.240|INIT|rdkafka#consumer-1| [thrd:app]: librdkafka v1.8.2 (0x10802ff) rdkafka#consumer-1 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, STRIP STATIC_LINKING CC GXX PKGCONFIG INSTALL GNULD LDS LIBDL PLUGINS STATIC_LIB_zlib ZLIB STATIC_LIB_libcrypto STATIC_LIB_libssl SSL STATIC_LIB_libzstd ZSTD HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER CRC32C_HW, debug 0xfffff)
%7|1651503922.240|BROADCAST|rdkafka#consumer-1| [thrd:main]: Broadcasting state change
%7|1651503922.240|CONNECT|rdkafka#consumer-1| [thrd:main]: example1.com:9092/bootstrap: Selected for cluster connection: coordinator query (broker has 0 connection attempt(s))
%7|1651503922.240|CGRPQUERY|rdkafka#consumer-1| [thrd:main]: Group "test": no broker available for coordinator query: intervaled in state query-coord
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]: Client configuration:
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   client.software.name = confluent-kafka-go
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   client.software.version = 1.8.2
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   metadata.broker.list = example1.com:9092
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   debug = generic,broker,topic,metadata,feature,queue,msg,protocol,cgrp,security,fetch,interceptor,plugin,consumer,admin,eos,mock,assignor,conf,all
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   enabled_events = 376
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   default_topic_conf = 0x12d85c0
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   group.id = test
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   enable.auto.commit = true
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]: Default topic configuration:
%7|1651503922.240|CONF|rdkafka#consumer-1| [thrd:app]:   auto.offset.reset = smallest
%7|1651503922.240|BRKMAIN|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: example1.com:9092/bootstrap: Enter main broker thread
before creating producer
%7|1651503922.240|CONNECT|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: example1.com:9092/bootstrap: Received CONNECT op
%7|1651503922.240|STATE|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: example1.com:9092/bootstrap: Broker changed state INIT -> TRY_CONNECT
%7|1651503922.240|BROADCAST|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: Broadcasting state change
%7|1651503922.240|CONNECT|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: example1.com:9092/bootstrap: broker in state TRY_CONNECT connecting
%7|1651503922.240|STATE|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: example1.com:9092/bootstrap: Broker changed state TRY_CONNECT -> CONNECT
%7|1651503922.240|BROADCAST|rdkafka#consumer-1| [thrd:example1.com:9092/bootstrap]: Broadcasting state change
%7|1651503922.240|WAKEUPFD|rdkafka#producer-2| [thrd:app]: example2.com:9092/bootstrap: Enabled low-latency ops queue wake-ups
%7|1651503922.240|BRKMAIN|rdkafka#producer-2| [thrd::0/internal]: :0/internal: Enter main broker thread
%7|1651503922.240|BROKER|rdkafka#producer-2| [thrd:app]: example2.com:9092/bootstrap: Added new broker with NodeId -1
%7|1651503922.240|CONNECT|rdkafka#producer-2| [thrd:app]: example2.com:9092/bootstrap: Selected for cluster connection: bootstrap servers added (broker has 0 connection attempt(s))
%7|1651503922.240|INIT|rdkafka#producer-2| [thrd:app]: librdkafka v1.8.2 (0x10802ff) rdkafka#producer-2 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, STRIP STATIC_LINKING CC GXX PKGCONFIG INSTALL GNULD LDS LIBDL PLUGINS STATIC_LIB_zlib ZLIB STATIC_LIB_libcrypto STATIC_LIB_libssl SSL STATIC_LIB_libzstd ZSTD HDRHISTOGRAM SYSLOG SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER CRC32C_HW, debug 0xfffff)
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]: Client configuration:
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   client.software.name = confluent-kafka-go
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   client.software.version = 1.8.2
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   metadata.broker.list = example2.com:9092
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   debug = generic,broker,topic,metadata,feature,queue,msg,protocol,cgrp,security,fetch,interceptor,plugin,consumer,admin,eos,mock,assignor,conf,all
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   enabled_events = 329
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   default_topic_conf = 0x12e3d70
%7|1651503922.240|BRKMAIN|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: example2.com:9092/bootstrap: Enter main broker thread
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]: Default topic configuration:
%7|1651503922.240|CONF|rdkafka#producer-2| [thrd:app]:   request.required.acks = -1
%7|1651503922.240|CONNECT|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: example2.com:9092/bootstrap: Received CONNECT op
%7|1651503922.240|STATE|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: example2.com:9092/bootstrap: Broker changed state INIT -> TRY_CONNECT
%7|1651503922.240|BROADCAST|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: Broadcasting state change
%7|1651503922.240|CONNECT|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: example2.com:9092/bootstrap: broker in state TRY_CONNECT connecting
%7|1651503922.240|STATE|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: example2.com:9092/bootstrap: Broker changed state TRY_CONNECT -> CONNECT
%7|1651503922.240|BROADCAST|rdkafka#producer-2| [thrd:example2.com:9092/bootstrap]: Broadcasting state change
Segmentation fault (core dumped)

As you see, "after creating producer" is never printed.

Checklist

Please provide the following information:

  • [x] confluent-kafka-go and librdkafka version (LibraryVersion()): 1.8.2
  • [ ] Apache Kafka broker version:
  • [x] Client configuration: ConfigMap{...}
  • [x] Operating system: Ubuntu 20.04.2 LTS
  • [x] Provide client logs (with "debug": ".." as necessary)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue

somagh avatar May 02 '22 15:05 somagh

Yeah, this seems to be related to the resolver:\

==1695972== Thread 14 rdk:broker-1:
==1695972== Invalid read of size 1
==1695972==    at 0x2FD215CA: internal_getent (files-XXX.c:173)
==1695972==    by 0x2FD229F3: _nss_files_gethostbyname4_r (files-hosts.c:400)
==1695972==    by 0x9841EE: gaih_inet.constprop.0 (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x985D38: getaddrinfo (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x59BE33: rd_getaddrinfo (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x54C81D: rd_kafka_broker_thread_main (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x4F2475: _thrd_wrapper_function (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x8EB6F8: start_thread (pthread_create.c:477)
==1695972==    by 0x98B392: clone (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==  Address 0x63 is not stack'd, malloc'd or (recently) free'd
==1695972==
==1695972==
==1695972== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1695972==  Access not within mapped region at address 0x63
==1695972==    at 0x2FD215CA: internal_getent (files-XXX.c:173)
==1695972==    by 0x2FD229F3: _nss_files_gethostbyname4_r (files-hosts.c:400)
==1695972==    by 0x9841EE: gaih_inet.constprop.0 (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x985D38: getaddrinfo (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x59BE33: rd_getaddrinfo (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x54C81D: rd_kafka_broker_thread_main (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x4F2475: _thrd_wrapper_function (in /home/maglun/gocode/src/testar/issue-774/main)
==1695972==    by 0x8EB6F8: start_thread (pthread_create.c:477)
==1695972==    by 0x98B392: clone (in /home/maglun/gocode/src/testar/issue-774/main)

It seems that with your extra build flags it won't dynamically link to libc, et.al.

edenhill avatar May 03 '22 07:05 edenhill

Standard static go build:

$ ldd issue-774
	linux-vdso.so.1 (0x00007ffde157b000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4a69f17000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4a69f11000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4a69eee000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4a69cfc000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4a6a0ab000)

With go build -a -ldflags "-linkmode external -extldflags -static" main.go:

$ ldd main
	not a dynamic executable

edenhill avatar May 03 '22 07:05 edenhill

Thank you for your attention. So you mean that for this usage, it should be dynamically linked to libraries?

somagh avatar May 08 '22 06:05 somagh

Yep, if you want to use glibc, you have to use dynamic linking, for functions like getaddrinfo which have to be linked dynamically.

If you are okay with using musl, you can achieve static linking as well

The details are here: https://github.com/confluentinc/confluent-kafka-go#static-builds-on-linux

milindl avatar Feb 28 '23 04:02 milindl