dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

cannot run dragonfly and test failed

Open trippleflux opened this issue 1 year ago • 6 comments

Describe the bug When running my self compiled dragonfly binary from main branch I am getting the following :

./dragonfly
Illegal instruction (core dumped)

From gdb :

Program received signal SIGILL, Illegal instruction.
__static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at ../src/server/cluster/cluster_family.cc:434
434     Mutex set_config_mu;

One of the test seem failed

[392/693] Building CXX object helio/util/fibers/CMakeFiles/fibers_test.dir/fibers_test.cc.o
FAILED: helio/util/fibers/CMakeFiles/fibers_test.dir/fibers_test.cc.o
/opt/rh/gcc-toolset-12/root/usr/bin/g++ -DBENCHMARK_STATIC_DEFINE -DBOOST_ASIO_SEPARATE_COMPILATION -DBOOST_BEAST_SEPARATE_COMPILATION -DBOOST_CONTEXT_DYN_LINK -DBOOST_CONTEXT_NO_LIB -DGLOG_CUSTOM_PREFIX_SUPPORT -DUSE_FB2 -D_TEST_BASE_FILE_=\"fibers_test.cc\" -I../ -I../genfiles -I../src -I../helio -I_deps/glog-build -I_deps/glog-src/src -I_deps/abseil_cpp-src -I_deps/benchmark-src/include -isystem _deps/gtest-src/googlemock/include -isystem _deps/gtest-src/googlemock -isystem _deps/gtest-src/include -isystem _deps/gtest-src -isystem _deps/gtest-src/googletest/include -isystem _deps/gtest-src/googletest -isystem third_party/libs/xxhash/include -isystem third_party/libs/gperf/include -isystem third_party/libs/uring/include -isystem third_party/libs/cares/include -Wall -Wextra -g -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-omit-frame-pointer -Wno-unused-parameter -march=sandybridge -mtune=skylake -Wno-use-after-free  -std=c++20 -DHAS_RAWMEMCHR -fdiagnostics-color=always  -O3 -DNDEBUG  -std=gnu++17 -MD -MT helio/util/fibers/CMakeFiles/fibers_test.dir/fibers_test.cc.o -MF helio/util/fibers/CMakeFiles/fibers_test.dir/fibers_test.cc.o.d -o helio/util/fibers/CMakeFiles/fibers_test.dir/fibers_test.cc.o -c ../helio/util/fibers/fibers_test.cc
../helio/util/fibers/fibers_test.cc: In lambda function:
../helio/util/fibers/fibers_test.cc:523:55: error: ‘gettid’ was not declared in this scope; did you mean ‘getuid’?
  523 |   pid_t dest_tid = pth.get()->AwaitBrief([&] { return gettid(); });
      |                                                       ^~~~~~
      |                                                       getuid
../helio/util/fibers/fibers_test.cc: In member function ‘virtual void util::fb2::ProactorTest_Migrate_Test::TestBody()’:
../helio/util/fibers/fibers_test.cc:523:41: error: no matching function for call to ‘util::fb2::ProactorBase::AwaitBrief(util::fb2::ProactorTest_Migrate_Test::TestBody()::<lambda()>)’
  523 |   pid_t dest_tid = pth.get()->AwaitBrief([&] { return gettid(); });
      |                    ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../helio/util/fibers/epoll_proactor.h:7,
                 from ../helio/util/fibers/fibers_test.cc:15:
../helio/util/fibers/proactor_base.h:328:31: note: candidate: ‘template<class Func> decltype (f()) util::fb2::ProactorBase::AwaitBrief(Func&&)’
  328 | template <typename Func> auto ProactorBase::AwaitBrief(Func&& f) -> decltype(f()) {
      |                               ^~~~~~~~~~~~
../helio/util/fibers/proactor_base.h:328:31: note:   template argument deduction/substitution failed:
../helio/util/fibers/fibers_test.cc: In lambda function:
../helio/util/fibers/fibers_test.cc:529:18: error: ‘gettid’ was not declared in this scope; did you mean ‘getuid’?
  529 |     pid_t tid1 = gettid();
      |                  ^~~~~~
      |                  getuid

To Reproduce Building my own on Rocky Linux 8 with self compiled boost 1.82 and bison 3.8.2 also latest libpfm

gcc version

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/gcc-toolset-12/root/usr/libexec/gcc/x86_64-redhat-linux/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/opt/rh/gcc-toolset-12/root/usr --mandir=/opt/rh/gcc-toolset-12/root/usr/share/man --infodir=/opt/rh/gcc-toolset-12/root/usr/share/info --with-bugurl=https://bugs.rockylinux.org/ --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-12.2.1-20221121/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.1 20221121 (Red Hat 12.2.1-7) (GCC)

Expected behavior dragonfly binary able to run successfully without crashing

Environment (please complete the following information):

  • OS: Rocky Linux 8
  • Kernel: # Linux host 6.4.1-1.el8.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jul 1 14:36:25 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: KVM
  • Dragonfly Version: main branch

trippleflux avatar Jul 05 '23 08:07 trippleflux

Version 1.5.0 also failed to link with libcares :

[199/199] Linking CXX executable dragonfly
FAILED: dragonfly
: && /opt/rh/gcc-toolset-12/root/usr/bin/g++ -Wall -Wextra -g -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-omit-frame-pointer -Wno-unused-parameter -march=sandybridge -mtune=skylake -Wno-use-after-free  -std=c++20 -DHAS_RAWMEMCHR -fdiagnostics-color=always  -O3 -DNDEBUG  src/server/CMakeFiles/dragonfly.dir/dfly_main.cc.o -o dragonfly  lib/libbase.a  lib/libdragonfly_lib.a  lib/libdfly_transaction.a  lib/libdfly_core.a  lib/libquery_parser.a  third_party/libs/reflex/lib/libreflex.a  lib/liblua_modules.a  third_party/libs/lua/lib/liblua.a  lib/libdfly_facade.a  lib/libhttp_server_lib.a  lib/libmetrics.a  third_party/libs/gperf/lib/libprofiler.a  -lunwind  third_party/libs/dconv/lib/libdouble-conversion.a  lib/libredis_lib.a  third_party/libs/mimalloc/lib/libmimalloc.a  lib/libaws_lib.a  /usr/lib64/libxml2.so  lib/libstrings_lib.a  lib/libhtml_lib.a  lib/libhttp_client_lib.a  lib/libtls_lib.a  lib/libfibers2.a  lib/libio.a  third_party/libs/uring/lib/liburing.a  /usr/lib/libboost_context.so.1.82.0  third_party/libs/cares/lib/libcares.a  /usr/lib64/libssl.so  /usr/lib64/libcrypto.so  lib/libhttp_utils.a  lib/libbase.a  _deps/glog-build/libglog.a  /usr/lib64/libunwind.so  -pthread  _deps/abseil_cpp-build/absl/flags/libabsl_flags_parse.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_usage.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_usage_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_marshalling.a  _deps/abseil_cpp-build/absl/strings/libabsl_str_format_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_reflection.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_config.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_program_name.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_private_handle_accessor.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_commandlineflag.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_commandlineflag_internal.a  _deps/abseil_cpp-build/absl/strings/libabsl_cord.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_info.a  _deps/abseil_cpp-build/absl/strings/libabsl_cord_internal.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_functions.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_handle.a  _deps/abseil_cpp-build/absl/crc/libabsl_crc_cord_state.a  _deps/abseil_cpp-build/absl/crc/libabsl_crc32c.a  _deps/abseil_cpp-build/absl/crc/libabsl_crc_internal.a  _deps/abseil_cpp-build/absl/crc/libabsl_crc_cpu_detect.a  _deps/abseil_cpp-build/absl/hash/libabsl_hash.a  _deps/abseil_cpp-build/absl/hash/libabsl_city.a  _deps/abseil_cpp-build/absl/types/libabsl_bad_variant_access.a  _deps/abseil_cpp-build/absl/hash/libabsl_low_level_hash.a  _deps/abseil_cpp-build/absl/container/libabsl_raw_hash_set.a  _deps/abseil_cpp-build/absl/container/libabsl_hashtablez_sampler.a  _deps/abseil_cpp-build/absl/profiling/libabsl_exponential_biased.a  _deps/abseil_cpp-build/absl/synchronization/libabsl_synchronization.a  _deps/abseil_cpp-build/absl/synchronization/libabsl_graphcycles_internal.a  -lrt  _deps/abseil_cpp-build/absl/time/libabsl_time.a  _deps/abseil_cpp-build/absl/time/libabsl_civil_time.a  _deps/abseil_cpp-build/absl/time/libabsl_time_zone.a  _deps/abseil_cpp-build/absl/debugging/libabsl_failure_signal_handler.a  _deps/abseil_cpp-build/absl/debugging/libabsl_examine_stack.a  _deps/abseil_cpp-build/absl/debugging/libabsl_symbolize.a  _deps/abseil_cpp-build/absl/debugging/libabsl_demangle_internal.a  _deps/abseil_cpp-build/absl/base/libabsl_malloc_internal.a  _deps/abseil_cpp-build/absl/debugging/libabsl_stacktrace.a  _deps/abseil_cpp-build/absl/debugging/libabsl_debugging_internal.a  third_party/libs/xxhash/lib/libxxhash.a  lib/libhttp_beast_prebuilt.a  /usr/lib/libboost_system.so.1.82.0  _deps/abseil_cpp-build/absl/random/libabsl_random_distributions.a  _deps/abseil_cpp-build/absl/random/libabsl_random_seed_sequences.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_pool_urbg.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_hwaes.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_hwaes_impl.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_slow.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_platform.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_seed_material.a  _deps/abseil_cpp-build/absl/strings/libabsl_strings.a  _deps/abseil_cpp-build/absl/strings/libabsl_strings_internal.a  _deps/abseil_cpp-build/absl/base/libabsl_base.a  -lpthread  _deps/abseil_cpp-build/absl/base/libabsl_spinlock_wait.a  -lrt  _deps/abseil_cpp-build/absl/numeric/libabsl_int128.a  _deps/abseil_cpp-build/absl/types/libabsl_bad_optional_access.a  _deps/abseil_cpp-build/absl/base/libabsl_throw_delegate.a  _deps/abseil_cpp-build/absl/base/libabsl_raw_logging_internal.a  _deps/abseil_cpp-build/absl/base/libabsl_log_severity.a  _deps/abseil_cpp-build/absl/random/libabsl_random_seed_gen_exception.a  -lzstd  third_party/libs/lz4/lib/liblz4.a && :
/opt/rh/gcc-toolset-12/root/usr/libexec/gcc/x86_64-redhat-linux/12/ld: cannot find third_party/libs/cares/lib/libcares.a: No such file or directory
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

This cares cannot found based on main branch need to be added :

add_third_party(
  cares
  URL https://c-ares.org/download/c-ares-1.19.0.tar.gz
  CMAKE_PASS_FLAGS "-DCARES_SHARED:BOOL=OFF -DCARES_STATIC:BOOL=ON -DCARES_STATIC_PIC:BOOL=ON -DCMAKE_INSTALL_LIBDIR=lib"
)

into ./helios/cmake/third_party.cmake file.

Also after successfull dragonfly 1.5.0 compilation, it's still failed to launch :

Program received signal SIGILL, Illegal instruction.
0x0000000000557b41 in std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_Vector_impl_data::_Vector_impl_data (__x=..., this=<optimized out>, this=<optimized out>, __x=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_vector.h:106
106             : _M_start(__x._M_start), _M_finish(__x._M_finish),

trippleflux avatar Jul 05 '23 09:07 trippleflux

Could you please post your CPU model and the result of lscpu on your system?

royjacobson avatar Jul 05 '23 10:07 royjacobson

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              10
On-line CPU(s) list: 0-9
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           10
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Red Hat
CPU family:          6
Model:               13
Model name:          QEMU Virtual CPU version 2.5+
BIOS Model name:     RHEL 7.6.0 PC (i440FX + PIIX, 1996)
Stepping:            3
CPU MHz:             2199.998
BogoMIPS:            4399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0-9
Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm pti

Upon linkage or compilation, I can see either gcc or linker trying to assign if not mistaken --march --tune=sandybridge

This gcc and g++ 12 compiler is installed through a repo. Compilation for others open source project is unaffected and only first experienced with dragonfly project with possible binary incompatibilities.

From dragonfly cmake log :

CXX_FLAGS -Wall -Wextra -g -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-omit-frame-pointer -Wno-unused-parameter -march=sandybridge -mtune=skylake -Wno-use-after-free -flto -std=c++20 -DHAS_RAWMEMCHR -fdiagnostics-color=always  -O3 -DNDEBUG

Found the possible issue :

if (NOT MARCH_OPT)
  if (CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64")
    set(MARCH_OPT "-march=armv8.2-a+fp16+rcpc+dotprod+crypto")
  elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" OR CMAKE_SYSTEM_PROCESSOR STREQUAL "amd64")
    # FreeBSD uses amd64.
    # Github actions use DSv2 that may use haswell cpus.
    # We will make it friendly towards older architectures so that will run on developers laptops.
    # However, we will tune it towards intel skylakes that are common in public clouds.
    set(MARCH_OPT "-march=sandybridge -mtune=skylake")
  elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL "arm64")
    # MacOS on arm64 - TBD.
  else()
    MESSAGE(FATAL_ERROR "Unsupported architecture ${CMAKE_SYSTEM_PROCESSOR}")
  endif()
endif()

I think change it into :

    set(MARCH_OPT "-march=native")

Would be safer for x86_64

trippleflux avatar Jul 05 '23 10:07 trippleflux

march=native is good for developers, but it's a bad option because that would mean the compilation depends on the CPU that compiles the binary. This causes mostly chaos when people try to deploy the compiled binary :)

Instead dragonfly has minimal support requirements (SSE4, AVX2) that a processor should support (and your CPU doesn't). I agree it might be nice to make it configurable in the CMake script but it's not a default that we'll change.

royjacobson avatar Jul 05 '23 11:07 royjacobson

Yes I have forgotten about binaries portabilities.

Well dedicated server with such modern CPU instructions sets supports is still expensive for now from where I come from.

I am stuck with VPS things.

trippleflux avatar Jul 05 '23 11:07 trippleflux

In cmake we can pass on a custom var for being pick up inside it, perhaps a possible solution.

Also found it in lua dep in here.

Possible temporarily solution : (TESTED) (NOW BINARY ABLE TO RUN!)

sed -i 's/-march=sandybridge -mtune=skylake/-mtune=generic -O3 -pipe -fomit-frame-pointer/g' helio/cmake/internal.cmake
sed -i 's/-march=sandybridge/-mtune=generic -O3 -pipe -fomit-frame-pointer/g' patches/lua-v5.4.4.patch
sed -i 's/-march=core2/-mtune=generic/g' src/server/CMakeLists.txt

Also worth to mention in the documentation that recommended lib boost that being use is version 1.76

[EDITED]

trippleflux avatar Jul 05 '23 11:07 trippleflux

closing

romange avatar Nov 01 '23 15:11 romange