RandomX icon indicating copy to clipboard operation
RandomX copied to clipboard

Segfault on sample code on init cache - $100 bug fix bounty

Open FreeTrade opened this issue 2 years ago • 35 comments

Hoping for some pointers on what I might be doing wrong here -

Got the following sample code -

    const char myKey[] = "RandomX example key";
    const char myInput[] = "RandomX example input";
    char hash[RANDOMX_HASH_SIZE];
    std::cout << "Get flags" << std::endl;
    randomx_flags flags = randomx_get_flags();
    std::cout << flags << std::endl;
    std::cout << "Allocate cache" << std::endl;
    randomx_cache *myCache = randomx_alloc_cache(flags);
	if (myCache == nullptr) {
		std::cout << "Cache allocation failed" << std::endl;
		return SerializeHash(*this);
	}
	std::cout << "Init cache" << std::endl;
	randomx_init_cache(myCache, &myKey, sizeof myKey);
    std::cout << "MyMachine" << std::endl;

with output -

Get flags
106
Allocate cache
Init cache
Segmentation fault (core dumped)

free -h is

                    total        used        free      shared  buff/cache   available
Mem:           3.8Gi       262Mi       847Mi       1.0Mi       2.7Gi       3.3Gi
Swap:          8.0Gi        40Mi       8.0Gi

Running ubuntu 23.04. Tried compiling with both -DARCH=native and no -DARCH

FreeTrade avatar Jul 29 '23 07:07 FreeTrade

Which CPU are you using?

Can you provide a stack trace? Run a debug build with gdb and when it crashes, use the bt command.

tevador avatar Jul 29 '23 13:07 tevador

Thanks for the help, here's that info

Get flags
0
Allocate cache

Program received signal SIGSEGV, Segmentation fault.
0x0000555555a81a37 in randomx::generateSuperscalar(randomx::SuperscalarProgram&, randomx::Blake2Generator&) ()
(gdb) bt
#0  0x0000555555a81a37 in randomx::generateSuperscalar(randomx::SuperscalarProgram&, randomx::Blake2Generator&) ()
#1  0x0000555555a7f72d in randomx::initCache(randomx_cache*, void const*, unsigned long) ()
#2  0x0000555555a75067 in randomx_init_cache ()
#3  0x00005555559df520 in CBlockHeader::GetHash (this=this@entry=0x555555dc2098 <mainParams+1848>)
    at primitives/block.cpp:38
#4  0x00005555558e002d in CMainParams::CMainParams (this=0x555555dc1960 <mainParams>) at chainparams.cpp:174
#5  0x0000555555600cbc in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at /usr/include/c++/11/bits/std_thread.h:86
#6  _GLOBAL__sub_I_nMiningForkTime () at chainparams.cpp:1129
#7  0x00007ffff72d6ebb in call_init (env=<optimized out>, argv=0x7fffffffe4f8, argc=1) at ../csu/libc-start.c:145
#8  __libc_start_main_impl (main=0x5555555ea940 <main(int, char**)>, argc=1, argv=0x7fffffffe4f8, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe4e8) at ../csu/libc-start.c:379
#9  0x0000555555607955 in _start ()

cpu info (host describes it as an Intel Xeon)

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  2
  On-line CPU(s) list:   0,1
Vendor ID:               GenuineIntel
  Model name:            Intel Core Processor (Skylake, IBRS)
    CPU family:          6
    Model:               94
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    Stepping:            3
    BogoMIPS:            7391.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2
                         ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ss
                         se3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand h
                         ypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb fsgsbase bmi1 avx2 smep bmi2 e
                         rms invpcid xsaveopt arat
Virtualization features:
  Hypervisor vendor:     Microsoft
  Virtualization type:   full
Caches (sum of all):
  L1d:                   64 KiB (2 instances)
  L1i:                   64 KiB (2 instances)
  L2:                    4 MiB (1 instance)
  L3:                    16 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0,1

FreeTrade avatar Jul 30 '23 01:07 FreeTrade

  1. Your stack trace doesn't include line numbers, which means you are not using a debug build of librandomx.a. Can you please repeat it with a debug build?
  2. Have you made any changes to the randomx code? Especially in files configuration.h or superscalar.cpp?
  3. Can you enable the trace output? You'll have to edit CMakeLists.txt and add add_definitions(-DTRACE) somewhere near the top and rebuild librandomx.a.

tevador avatar Jul 30 '23 08:07 tevador

  1. Ok.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Get flags
0
Allocate cache

Program received signal SIGSEGV, Segmentation fault.
0x0000555555a9e7b0 in randomx::SuperscalarInstructionInfo::getType (this=0x0) at /root/RandomX/src/superscalar.cpp:172                             return type_;
(gdb) bt
#0  0x0000555555a9e7b0 in randomx::SuperscalarInstructionInfo::getType (this=0x0) at /root/RandomX/src/superscalar.cpp:172
#1  0x0000555555a9f44f in randomx::SuperscalarInstruction::getType (this=0x7fffffffd400) at /root/RandomX/src/superscalar.cpp:539
#2  0x0000555555a9c920 in randomx::generateSuperscalar (prog=..., gen=...) at /root/RandomX/src/superscalar.cpp:681
#3  0x0000555555a98849 in randomx::initCache (cache=0x555555f1f840, key=0x7fffffffdf10, keySize=20) at /root/RandomX/src/dataset.cpp:130
#4  0x0000555555a744c5 in randomx_init_cache (cache=0x555555f1f840, key=0x7fffffffdf10, keySize=20) at /root/RandomX/src/randomx.cpp:130
#5  0x00005555559de5c0 in CBlockHeader::GetHash (this=this@entry=0x555555de60d8 <mainParams+1848>) at primitives/block.cpp:38
#6  0x00005555558df0cd in CMainParams::CMainParams (this=0x555555de59a0 <mainParams>) at chainparams.cpp:174
#7  0x00005555556008bc in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at /usr/include/c++/11/bits/std_thread.h:86
#8  _GLOBAL__sub_I_nMiningForkTime () at chainparams.cpp:1129
#9  0x00007ffff72d6ebb in call_init (env=<optimized out>, argv=0x7fffffffe4d8, argc=1) at ../csu/libc-start.c:145
#10 __libc_start_main_impl (main=0x5555555ea540 <main(int, char**)>, argc=1, argv=0x7fffffffe4d8, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=0x7fffffffe4c8) at ../csu/libc-start.c:379
#11 0x00005555556069f5 in _start ()
  1. No changes yet, just trying to get the default running before experimenting.
  2. Will try next

FreeTrade avatar Jul 31 '23 01:07 FreeTrade

You'll have to edit CMakeLists.txt and add add_definitions(-DTRACE) somewhere near the top and rebuild librandomx.a.

Did this but doesn't seem to have made any difference to the output. Sorry, I'm unfamiliar with these debugging tools.

FreeTrade avatar Jul 31 '23 01:07 FreeTrade

This stacktrace is a bit more helpful, but I still don't see why it crashes.

The crash happens on the very first iteration here:

https://github.com/tevador/RandomX/blob/901f8ef765e7c274852dcb4d477247fd6747a5b8/src/superscalar.cpp#L681

currentInstruction is initialized to SuperscalarInstruction::Null, which is initialized with a pointer to SuperscalarInstructionInfo::NOP, but in the call to SuperscalarInstructionInfo::getType, your this pointer is null, which shouldn't happen.

Can you try to compile and run just the example code from here without the bitcoin wrapper you are using? https://github.com/tevador/RandomX/blob/master/src/tests/api-example1.c

Which compiler version are you using?

tevador avatar Jul 31 '23 17:07 tevador

Compiling using gcc api-example1.c -L/root/RandomX/build -lrandomx -lstdc++ -lm -lc succeeds and program runs correctly. Also the benchmark and tests run correctly.

gcc -v

root@bchx:~/RandomX# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04.1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)

I'm compiling in the context of the Bitcoin Unlimited makefile + links to the compiled librandomx.a -

https://gitlab.com/bitcoinunlimited/BCHUnlimited/-/blob/dev/src/Makefile.am?ref_type=heads

maybe it's adding some incompatible compiler option?

FreeTrade avatar Aug 01 '23 06:08 FreeTrade

It appears that the constructors of static globals are not being called in your case. It's probably a problem with the linker. Can you try linking with the --whole-archive option? Otherwise I'm out of ideas.

tevador avatar Aug 01 '23 21:08 tevador

OK, Thanks for the debugging help and the suggestion. Unfortunately --whole-archive hasn't made any difference :(

FreeTrade avatar Aug 02 '23 03:08 FreeTrade

@sickpig - any idea why calling randomx in the context of a bitcoin-unlimited makefile build would cause a segfault? Any options I could tweak to avoid it?

FreeTrade avatar Aug 05 '23 04:08 FreeTrade

Just an update - tried with a clean Ubuntu 20.04 install - same error.

Adding a small $50 bounty for a fix, or suggestion that leads to a fix.

FreeTrade avatar Aug 10 '23 07:08 FreeTrade

@sickpig - any idea why calling randomx in the context of a bitcoin-unlimited makefile build would cause a segfault? Any options I could tweak to avoid it?

sorry but what do you mean in the context of bitcoin-unlimited? or even better how BU is related to randomX?

what are you trying to do?

sickpig avatar Aug 10 '23 13:08 sickpig

@sickpig - any idea why calling randomx in the context of a bitcoin-unlimited makefile build would cause a segfault? Any options I could tweak to avoid it?

sorry but what do you mean in the context of bitcoin-unlimited? or even better how BU is related to randomX?

what are you trying to do?

Hey, thanks for stopping by - I'm working on a project that uses the BU codebase (and build/makefile), but I'm using the RandomX library as a PoW function. When I call the RandomX functions, I get a segmentation fault. I thought you might have a knowledge of anything special or unusual going on with the BU build process that might cause the error?

FreeTrade avatar Aug 10 '23 14:08 FreeTrade

@FreeTrade I have found this from Upwork.

It compiles and runs fine for me, no seg fault:

Get flags
106
Allocate cache
Init cache
MyMachine

I am using a Ubuntu OS.

This is how I have built the binary:

g++ segFault.cpp -o segFault -lrandomx

Full code:

#include <iostream>
#include <randomx.h>

int main()
{
    const char myKey[] = "RandomX example key";
    const char myInput[] = "RandomX example input";
    char hash[RANDOMX_HASH_SIZE];
    std::cout << "Get flags" << std::endl;
    randomx_flags flags = randomx_get_flags();
    std::cout << flags << std::endl;
    std::cout << "Allocate cache" << std::endl;
    randomx_cache *myCache = randomx_alloc_cache(flags);
    if (myCache == nullptr) {
        std::cout << "Cache allocation failed" << std::endl;
        //return SerializeHash(*this);
        return 1;
    }
    std::cout << "Init cache" << std::endl;
    randomx_init_cache(myCache, &myKey, sizeof myKey);
    std::cout << "MyMachine" << std::endl;

    return 0;
}

Note: I have also built the RandomX library using the provided cmake instructions:

sudo make install [ 79%] Built target randomx [ 88%] Built target randomx-benchmark [ 94%] Built target randomx-codegen [100%] Built target randomx-tests Install the project... -- Install configuration: "Release" -- Installing: /usr/local/lib/librandomx.a -- Installing: /usr/local/include/randomx.h

avrdan avatar Aug 10 '23 19:08 avrdan

@avrdan, thanks, yes the code runs if compiled standalone - the challenge seems to be getting it to run in the context of a Bitcoin Unlimited build.

FreeTrade avatar Aug 11 '23 03:08 FreeTrade

Did you edit the Bitcoin Unlimited makefile to include the librandomx.a library? If yes, we need to add another entry like the following:

librandomx_a_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES)
librandomx_a_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
librandomx_a_SOURCES = \
  randomx.h

I am assuming you have already done this? In any case, including this library should work the same way, regardless of the main project.

avrdan avatar Aug 11 '23 04:08 avrdan

Yes, thanks, I'm going to make a repo with my changes so it is clearer what the problem is.

FreeTrade avatar Aug 11 '23 05:08 FreeTrade

Ok, here's the full repo -

https://gitlab.com/FreeTrade68/bchrx

The changes to include randomx library

https://gitlab.com/FreeTrade68/bchrx/-/compare/dev...dev?from_project_id=19725714

(Quite possible I've done something silly in the makefile that causes the problem. Configuring libs is not my strong suit)

To build

Probably need these

sudo apt-get install build-essential libtool autotools-dev autoconf automake pkg-config libssl-dev libevent-dev bsdmainutils git
sudo apt-get install libboost-all-dev
sudo apt-get install libminiupnpc-dev
sudo apt-get install libzmq3-dev
sudo apt install libdb5.3++ libdb5.3++-dev

then

git clone --single-branch https://gitlab.com/FreeTrade68/bchrx
cd bchrx/
./autogen.sh
./configure  --disable-tests --with-incompatible-bdb --enable-upnp-default --with-gui=no
make

Error

root@bchx:~/bchrx# ./src/bitcoind
Get flags
0
Allocate cache
Segmentation fault (core dumped)

FreeTrade avatar Aug 11 '23 07:08 FreeTrade

Are you sure /bchrx/src/randomx is the path to the installation of your RandomX? Meaning that you have the include and library files in there?

I used the default install folders, so the lib goes under /usr/local/lib and the include under /usr/local/include. In this way, the library is installed in usr. This is in anyway cleaner, as you shouldn't have libraries in a src folder.. but this may just be a side note.

avrdan avatar Aug 11 '23 07:08 avrdan

Pretty sure it's not a matter of wrong paths. It compiles and some of the functions are successfully called. No doubt there are ways to clean up the folder location - but focused on just getting it running first.

FreeTrade avatar Aug 11 '23 08:08 FreeTrade

This must be more difficult that I had thought. Increasing bounty to $100

FreeTrade avatar Aug 13 '23 05:08 FreeTrade

Good Job.

msp01 avatar Aug 13 '23 15:08 msp01

@FreeTrade check your messages on upwork from me.

Regards Anshul Mittal

anshulmttl avatar Aug 13 '23 16:08 anshulmttl

@anshulmttl Thanks Anshul - this is now a public bounty so I can't assign it as a project. First solution posted here wins the bounty.

FreeTrade avatar Aug 14 '23 02:08 FreeTrade

@FreeTrade Since you have contacted me on Upwork you will have to provide me project on Upwork only.

anshulmttl avatar Aug 14 '23 02:08 anshulmttl

Ok, the bounty is suspended while I confirm Anshul has found a solution.

FreeTrade avatar Aug 14 '23 03:08 FreeTrade

Is this issue solved?

hanaa12G avatar Aug 14 '23 12:08 hanaa12G

@FreeTrade you can also try to build BCHUnlimited code with --disable-hardening flag for configure.ac. These "hardening" flags might mess up linking with RandomX library. I suspect it's some incompatibility between compiler/linker flags you use for RandomX and for BCHUnlimited.

SChernykh avatar Aug 14 '23 12:08 SChernykh

Thanks for the suggestion - actually yes I did try the --disable-hardening but alas it didn't resolve it for me. Will likely move forward with @anshulmttl resolution, although I accept your reservations with it.

FreeTrade avatar Aug 14 '23 12:08 FreeTrade

Accepted @anshulmttl resolution for the problem and paid $100 bounty. https://github.com/tevador/RandomX/pull/272

If anyone else runs into the same issue - be aware of @SChernykh note that this is probably an issue with the build/compile so this may be a workaround to a different problem rather than a fix.

FreeTrade avatar Aug 15 '23 03:08 FreeTrade