pulsar-client-cpp icon indicating copy to clipboard operation
pulsar-client-cpp copied to clipboard

[Bug] SampleAsyncProducer causes core dump

Open xiaoliu1019 opened this issue 1 year ago • 12 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Version

pulsar cpp client 3.5.1

Minimal reproduce step

run the example: SampleAsyncProducer.cc

What did you expect to see?

it will produce the messages

What did you see instead?

it will have the coredump

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

xiaoliu1019 avatar Jul 18 '24 03:07 xiaoliu1019

Please provide more info:

  • Your OS and compiler
  • How did you use the library? Installed from the official pre-built libraries or built from source? If built from source, please provide the detailed steps.

BewareMyPower avatar Jul 18 '24 07:07 BewareMyPower

The OS isTencentOS developed based on centos I use the official pre-built libraries and my compiler is g++8.5.0 If I want to compile successfully, I must add compilation options --copt=-D_GLIBCXX_USE_CXX11_ABI=0 then this will result in coredump,when call the callback of sendAsync

xiaoliu1019 avatar Jul 18 '24 08:07 xiaoliu1019

Oh, this issue can be reproduced. Assigned it to me first.

(gdb) bt
#0  std::_Function_handler<void (pulsar::Result, pulsar::MessageId const&), void (*)(pulsar::Result, pulsar::MessageId const&)>::_M_invoke(std::_Any_data const&, pulsar::Result&&, pulsar::MessageId const&) (__functor=..., 
    __args#0=<error reading variable>, __args#1=...) at /usr/include/c++/8/bits/std_function.h:297
#1  0x0000ffff820a89a8 in std::function<void (pulsar::Result, pulsar::MessageId const&)>::operator()(pulsar::Result, pulsar::MessageId const&) const (this=<optimized out>, __args#0=<optimized out>, __args#1=...)
    at /usr/include/c++/4.8.2/functional:2471
#2  0x0000ffff82079c04 in std::function<void (pulsar::Result, pulsar::MessageId const&)>::operator()(pulsar::Result, pulsar::MessageId const&) const (__args#1=..., __args#0=pulsar::ResultOk, this=0xffff7c006760)
    at /usr/include/c++/4.8.2/functional:2471
#3  pulsar::completeSendCallbacks (id=..., result=pulsar::ResultOk, callbacks=std::vector of length 1, capacity 1 = {...}) at /usr/src/debug/apache-pulsar-client-cpp-3.5.1/lib/MessageAndCallbackBatch.cc:94
#4  pulsar::MessageAndCallbackBatch::__lambda5::operator() (id=..., result=pulsar::ResultOk, __closure=0xffff7c004f10) at /usr/src/debug/apache-pulsar-client-cpp-3.5.1/lib/MessageAndCallbackBatch.cc:100

BewareMyPower avatar Jul 18 '24 12:07 BewareMyPower

It seems to be the libstdc++ incompatibility in GCC 4.8.

It should already be fixed by https://github.com/apache/pulsar-client-cpp/pull/428. Could you try the RPM packages in https://github.com/BewareMyPower/pulsar-client-cpp/actions/runs/9535942883

BewareMyPower avatar Jul 18 '24 13:07 BewareMyPower

Oh, I use the new pre-built libraries you gave https://github.com/BewareMyPower/pulsar-client-cpp/actions/runs/9535942883 When I run this program a core dump

2024-07-19 15:42:50.879 INFO  [140737353004672] ClientConnection:187 | [<none> -> pulsar://21.6.118.142:6650] Create ClientConnection, timeout=2000
2024-07-19 15:42:50.879 INFO  [140737353004672] ConnectionPool:124 | Created connection for pulsar://21.6.118.142:6650-pulsar://21.6.118.142:6650-0
[New Thread 0x7fffdc8ca700 (LWP 58501)]
2024-07-19 15:42:50.881 INFO  [140736901986048] ClientConnection:403 | [21.6.92.133:51892 -> 21.6.118.142:6650] Connected to broker
Missing separate debuginfos, use: dnf debuginfo-install bash-4.4.20-4.tl3.tencentos.x86_64 brotli-1.0.6-3.tl3.x86_64 cyrus-sasl-lib-2.1.27-6.tl3.x86_64 glibc-2.28-225.tl3.6.x86_64 keyutils-libs-1.5.10-9.tl3.x86_64 krb5-libs-1.18.2-22.tl3.x86_64 libcom_err-1.45.6-5.tl3.x86_64 libcurl-7.61.1-33.tl3.x86_64 libgcc-8.5.0-18.tl3.x86_64 libidn2-2.2.0-1.tl3.x86_64 libnghttp2-1.33.0-5.tl3.x86_64 libpsl-0.20.2-6.tl3.x86_64 libselinux-2.9-8.tl3.x86_64 libssh-0.9.6-10.tl3.x86_64 libstdc++-8.5.0-18.tl3.x86_64 libxcrypt-4.1.1-6.tl3.x86_64 pcre2-10.32-3.tl3.x86_64 zlib-1.2.11-21.tl3.x86_64
--Type <RET> for more, q to quit, c to continue without paging--

Thread 61 "Pulsar_producer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdd0cb700 (LWP 58500)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000aa8c48 in google::protobuf::MessageLite::SerializePartialToArray (this=0x7fffdd0acd60, data=0x7fffcc001fb8, size=41)
    at external/protobuf_archive/src/google/protobuf/message_lite.cc:489
#2  0x0000000000aa97ba in google::protobuf::MessageLite::SerializeToArray (this=0x7fffdd0acd60, data=0x7fffcc001fb8, size=41)
    at external/protobuf_archive/src/google/protobuf/message_lite.cc:481
#3  0x00007ffff6f37305 in pulsar::Commands::writeMessageWithSize(pulsar::proto::BaseCommand const&) () from /lib/libpulsar.so
#4  0x00007ffff6f37f3a in pulsar::Commands::newConnect(std::shared_ptr<pulsar::Authentication> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, pulsar::Result&) () from /lib/libpulsar.so
#5  0x00007ffff6eec8b8 in pulsar::ClientConnection::handleHandshake(std::error_code const&) () from /lib/libpulsar.so
#6  0x00007ffff6eef0ab in pulsar::ClientConnection::handleTcpConnected(std::error_code const&, asio::ip::basic_resolver_iterator<asio::ip::tcp>) () from /lib/libpulsar.so
#7  0x00007ffff6ef059b in pulsar::ClientConnection::handleResolve(std::error_code const&, asio::ip::basic_resolver_iterator<asio::ip::tcp>)::{lambda(std::error_code const&)#2}::operator()(std::error_code const&) const () from /lib/libpulsar.so
#8  0x00007ffff6ef0a04 in asio::detail::reactive_socket_connect_op<pulsar::ClientConnection::handleResolve(std::error_code const&, asio::ip::basic_resolver_iterator<asio::ip::tcp>)::{lambda(std::error_code const&)#2}, asio::any_io_executor>::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned long) ()
   from /lib/libpulsar.so
#9  0x00007ffff6f024f6 in asio::detail::epoll_reactor::descriptor_state::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned long) ()
   from /lib/libpulsar.so
#10 0x00007ffff6f01e32 in asio::detail::scheduler::run(std::error_code&) () from /lib/libpulsar.so
#11 0x00007ffff6f7a37a in pulsar::ExecutorService::start()::{lambda()#1}::operator()() const [clone .isra.240] () from /lib/libpulsar.so
#12 0x00007ffff762cf43 in execute_native_thread_routine () from /lib/libpulsar.so
#13 0x00007ffff68e91ca in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff5a22e73 in clone () from /lib64/libc.so.6

Here are my compilation options:

build --compilation_mode=dbg
build --cxxopt="--std=c++17"
build --copt=-O2
test --cache_test_results=no --test_output=errors

xiaoliu1019 avatar Jul 19 '24 07:07 xiaoliu1019

It's suspicious about the path external/protobuf_archive/src/google/protobuf/message_lite.cc. The library was built via vcpkg and I cannot find any directory named protobuf_archive. Could you check the link paths of the dynamic library and the executable via ldd?

BTW, could you try building in release mode?

BewareMyPower avatar Jul 19 '24 12:07 BewareMyPower

I can build in release mode,but the same error occurs when it runs the ldd:

 ldd ./bazel-bin/example/Pulsar_producer_client
        linux-vdso.so.1 (0x00007ffc4fbbc000)
        /$LIB/libonion.so => /lib64/libonion.so (0x00007f04a54d0000)
        libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f04a501e000)
        libpulsar.so => /lib/libpulsar.so (0x00007f04a3fe0000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f04a3dc0000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f04a3a3e000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f04a383a000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f04a34a5000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f04a328d000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f04a2ec8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f04a52ad000)
        libnghttp2.so.14 => /lib64/libnghttp2.so.14 (0x00007f04a2ca1000)
        libidn2.so.0 => /lib64/libidn2.so.0 (0x00007f04a2a83000)
        libssh.so.4 => /lib64/libssh.so.4 (0x00007f04a2813000)
        libpsl.so.5 => /lib64/libpsl.so.5 (0x00007f04a2602000)
        libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f04a2366000)
        libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f04a1e74000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f04a1c1f000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f04a1935000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f04a171e000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f04a151a000)
        libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007f04a5470000)
        liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007f04a545e000)
        libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007f04a544f000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f04a1302000)
        libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f04a0f81000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f04a0d79000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f04a0b68000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f04a5446000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f04a0950000)
        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f04a5424000)
        libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007f04a5401000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f04a0725000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f04a53d6000)
        libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f04a04a1000)

it looks relatively normal

xiaoliu1019 avatar Jul 26 '24 07:07 xiaoliu1019

This binary links so many unrelated dynamic libraries. Most of them are from libcurl and OpenSSL.

Which library did you use? Currently, it would be better to link libpulsar.so or libpulsarwithdeps.a. It seems that you're using libpulsar.a and link to 3rd party dependencies from your system.

Besides, what is your compiler toolchain? Generally, if you're building directly via g++ like the guide here, it should work.

BewareMyPower avatar Jul 26 '24 08:07 BewareMyPower

I use libpulsar.so from https://github.com/BewareMyPower/pulsar-client-cpp/actions/runs/9535942883, It doesn't work just when I switch the version of libpulsar.so from 3.5.1 to 3.6.0. And , i can work directly via g++ but can not wort when i use bazel(I use the default compiler toolchain) Could it be that the bazel external dependency and libpulsar.so use different versions of external/protobuf_archive/src/google/protobuf/message_lite.cc?

xiaoliu1019 avatar Jul 26 '24 08:07 xiaoliu1019

Could it be that the bazel external dependency and libpulsar.so use different versions of external/protobuf_archive/src/google/protobuf/message_lite.cc?

Yeah it's right. So I believe it's something wrong with your Bazel project. I have experiences with Bazel a few years ago. Could you share a minimum reproducible Bazel project?

BewareMyPower avatar Jul 27 '24 11:07 BewareMyPower

why the lib/Commands.cc include <pulsar/Version.h> ,but There is no such file I can't compile this source code

xiaoliu1019 avatar Aug 01 '24 09:08 xiaoliu1019

This header was generated by CMake: https://github.com/apache/pulsar-client-cpp/blob/2a6916819b2a80a532f827dc96026b8fdc0b15ed/CMakeLists.txt#L56

BewareMyPower avatar Aug 01 '24 12:08 BewareMyPower