librdkafka icon indicating copy to clipboard operation
librdkafka copied to clipboard

RdKafka::Producer parses aws kafka broker reporting segment errors when the cluster information is retrieved after a few minutes

Open yxuechao007 opened this issue 3 years ago • 4 comments

The rdkafka library gave an error in resolving the aws kafka broker address when the cluster information is retrieved after a few minutes. And pulling the latest version of rdkafka on github, the same error was reported. is it a bug of getaddrinfo? how to fix this problem?

  1. Example Code: RdKafka::Producer *producer = RdKafka::Producer::create(producer_conf, errstr); RdKafka::ErrorCode resp = producer->produce(topic, client->partition(), RdKafka::Producer::RK_MSG_COPY, xxxxxxxxx, spans.ByteSize(), NULL, NULL);

  2. CoreDump: Program terminated with signal 11, Segmentation fault. #0 0x00007f57a2bee5ed in internal_getent () from /lib64/libnss_files.so.2 Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 (gdb) where #0 0x00007f57a2bee5ed in internal_getent () from /lib64/libnss_files.so.2 #1 0x00007f57a2bef7e3 in _nss_files_gethostbyname4_r () from /lib64/libnss_files.so.2 #2 0x0000000000833015 in gaih_inet () #3 0x0000000000835f3f in getaddrinfo () #4 0x000000000051b280 in rd_getaddrinfo (nodesvc=, defsvc=0x7f57a8839aec "9092", flags=32, family=, socktype=, protocol=, errstr=0x7f57a8838078) at rdaddr.c:168 #5 0x00000000004dc32a in rd_kafka_broker_resolve ( nodename=0x7f57a8837f70 "b-2.xxxxxxxxx.c3.kafka.ap-southeast-2.amazonaws.com:9092", rkb=0x7f579c001870) at rdkafka_broker.c:844 #6 rd_kafka_broker_connect (rkb=0x7f579c001870) at rdkafka_broker.c:1831 #7 rd_kafka_broker_thread_main (arg=0x7f579c001870) at rdkafka_broker.c:4344 #8 0x000000000051c857 in _thrd_wrapper_function (aArg=) at tinycthread.c:576 #9 0x00000000007ae8e4 in start_thread () #10 0x000000000083ab19 in clone ()

  3. env aws ec2 gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) glibc2.17

yxuechao007 avatar Nov 26 '21 09:11 yxuechao007

i met the same question.

ucanme avatar May 16 '22 15:05 ucanme

This looks to be a problem with librdkafka's interaction with the system resolver.

How are you installing librdkafka? By build from source or through a package manager?

edenhill avatar May 16 '22 15:05 edenhill

i met the same problem.after struggle of long time i solve it. it's a problem of glib, the func get_addr is not concurrent safe when build static. so use musl replace glib

docker run -it -v $(pwd):/workspace  golang:alpine3.14 /bin/sh   // anywhere
apk update // in docker 
apk add git alpine-sdk // in docker 
go build -ldflags "-linkmode external -extldflags '-static'" -tags musl ./cmd/main.go // in docker 

just try it.

ucanme avatar May 17 '22 04:05 ucanme