silkworm icon indicating copy to clipboard operation
silkworm copied to clipboard

Crash at public_key_to_address() (Senders stage)

Open chfast opened this issue 3 years ago • 2 comments

  INFO [09-30|21:24:48.410 UTC] [1/8 Senders]                      phase=2/3 blocks=10193021 current=20133 transactions=4458589 workers=16                                                                                        
  INFO [09-30|21:25:18.410 UTC] [1/8 Senders]                      phase=2/3 blocks=10193021 current=39108 transactions=8654934 workers=16                                                                                        
  INFO [09-30|21:25:48.410 UTC] [1/8 Senders]                      phase=2/3 blocks=10193021 current=67112 transactions=14686352 workers=16                                                                                       
  INFO [09-30|21:26:08.721 UTC] Collector flushed file             path=/blockchain/erigon-stable/etl-temp/137438013954776-0.bin size=256.00 MB in=395.503ms                                                                      
  INFO [09-30|21:26:18.410 UTC] [1/8 Senders]                      phase=2/3 blocks=10193021 current=100813 transactions=20717666 workers=15                                                                                      
                                                                                                                                                                                                                                  
Thread 20 "silkworm" received signal SIGSEGV, Segmentation fault.                                                                                                                                                                 
[Switching to Thread 0x7cffbcffa640 (LWP 1320644)]                                                                                                                                                                                
0x0000555555807e07 in public_key_to_address () at /home/ipsilon/silkworm/third_party/silkpre/lib/silkpre/ecdsa.c:54                                                                                                               
54          memcpy(out, &key_hash.bytes[12], 20);                                                                                                                                                                                 
(gdb) bt                                                                                                                                                                                                                          
#0  0x0000555555807e07 in public_key_to_address () at /home/ipsilon/silkworm/third_party/silkpre/lib/silkpre/ecdsa.c:54                                                                                                           
#1  silkpre_recover_address () at /home/ipsilon/silkworm/third_party/silkpre/lib/silkpre/ecdsa.c:64                                                                                                                               
#2  0x00005555557a971e in work () at /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_worker.cpp:65                                                                                                         
#3  0x00005555556aed16 in operator() () at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:43                                                                                                                         
#4  __invoke<(lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> () at /usr/lib/llvm-14/bin/../include/c++/v1/type_traits:3640                                                                         
#5  __thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> ()                     
    at /usr/lib/llvm-14/bin/../include/c++/v1/thread:282                                                                                                                                                                          
#6  __thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> >(void)  
    () at /usr/lib/llvm-14/bin/../include/c++/v1/thread:293                                                                                                                                                                       
#7  0x00007ffff7bc7b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442                                                                                                                                       
#8  0x00007ffff7c59a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81                                                                                                                                              

chfast avatar Sep 30 '22 21:09 chfast

@chfast Thanks for reporting this! I have some questions:

  • is this happening during your test using a database primed by Erigon and starting the staged sync loop from Senders stage (by commenting out first 3 stages as suggested by @mriccobene )?
  • what hardware platform and Silkworm commit did you use for running this test?
  • did you experiment this once or repeatedly?

canepat avatar Oct 05 '22 15:10 canepat

This happens after some minutes of running Senders state on Erigon database. This is heap-use-after-free bug.

=================================================================                                                                                                       
==3478483==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c295934bc80 at pc 0x55c9a54bd990 bp 0x7c2994dba190 sp 0x7c2994dba188                              
READ of size 8 at 0x7c295934bc80 thread T17                                                                                                                             
    #0 0x55c9a54bd98f in silkworm::stagedsync::recovery::RecoveryWorker::work() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_worker.cpp:61:38 
    #1 0x55c9a511499e in operator() /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:43:17                                                                   
    #2 0x55c9a511499e in __invoke<(lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> /usr/lib/llvm-14/bin/../include/c++/v1/type_traits:3640
:23                                                                                                                                                                     
    #3 0x55c9a511499e in __thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, (lambda at /home/ipsilo
n/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> /usr/lib/llvm-14/bin/../include/c++/v1/thread:282:5                                                             
    #4 0x55c9a511499e in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_stru
ct> >, silkworm::Worker::start(bool)::$_0> >(void*) /usr/lib/llvm-14/bin/../include/c++/v1/thread:293:5                                                                 
    #5 0x7f29af5d0b42 in start_thread nptl/./nptl/pthread_create.c:442:8                                                                                                
    #6 0x7f29af6629ff  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81                                                                                               
                                                                                                                                                                        
0x7c295934bc80 is located 853120 bytes inside of 67108864-byte region [0x7c295927b800,0x7c295d27b800)                                                                   
freed by thread T8 here:                                                                                                                                                
    #0 0x55c9a4f8e5ad in operator delete(void*) (/home/ipsilon/silkworm/build/asan/cmd/silkworm+0x2305ad) (BuildId: a5bf77ccfd350aa48011e40f73f9cc35c0a5777e)           
    #1 0x55c9a549f7d9 in __libcpp_operator_delete<void *> /usr/lib/llvm-14/bin/../include/c++/v1/new:255:3                                                              
    #2 0x55c9a549f7d9 in __do_deallocate_handle_size<> /usr/lib/llvm-14/bin/../include/c++/v1/new:279:10                                                                
    #3 0x55c9a549f7d9 in __libcpp_deallocate /usr/lib/llvm-14/bin/../include/c++/v1/new:295:14                                                                          
    #4 0x55c9a549f7d9 in deallocate /usr/lib/llvm-14/bin/../include/c++/v1/__memory/allocator.h:114:13                                                                  
    #5 0x55c9a549f7d9 in deallocate /usr/lib/llvm-14/bin/../include/c++/v1/__memory/allocator_traits.h:282:13                                                           
    #6 0x55c9a549f7d9 in ~vector /usr/lib/llvm-14/bin/../include/c++/v1/vector:402:9                                                                                    
    #7 0x55c9a549f7d9 in silkworm::stagedsync::recovery::RecoveryFarm::collect_workers_results() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_
farm.cpp:249:5                                                                                                                                                          
    #8 0x55c9a549651d in silkworm::stagedsync::recovery::RecoveryFarm::dispatch_batch() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:
336:30                                                                                                                                                                  
    #9 0x55c9a54945b5 in silkworm::stagedsync::recovery::RecoveryFarm::transform_and_fill_batch(unsigned long, std::__1::vector<silkworm::Transaction, std::__1::allocat
or<silkworm::Transaction> > const&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:325:14                                              
    #10 0x55c9a548bcc6 in silkworm::stagedsync::recovery::RecoveryFarm::recover() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:117:34
    #11 0x55c9a54815a1 in silkworm::stagedsync::Senders::forward(silkworm::db::RWTxn&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders.cpp:33:27          
                                                                                                                                                                        
previously allocated by thread T8 here:                                                                                                                                 
    #0 0x55c9a4f8dd4d in operator new(unsigned long) (/home/ipsilon/silkworm/build/asan/cmd/silkworm+0x22fd4d) (BuildId: a5bf77ccfd350aa48011e40f73f9cc35c0a5777e)      
    #1 0x55c9a54a9fba in __libcpp_operator_new<unsigned long> /usr/lib/llvm-14/bin/../include/c++/v1/new:245:10                                                         
    #2 0x55c9a54a9fba in __libcpp_allocate /usr/lib/llvm-14/bin/../include/c++/v1/new:271:10                                                                            
    #3 0x55c9a54a9fba in allocate /usr/lib/llvm-14/bin/../include/c++/v1/__memory/allocator.h:105:38                                                                    
    #4 0x55c9a54a9fba in allocate /usr/lib/llvm-14/bin/../include/c++/v1/__memory/allocator_traits.h:262:20
    #5 0x55c9a54a9fba in __split_buffer /usr/lib/llvm-14/bin/../include/c++/v1/__split_buffer:306:29
    #6 0x55c9a54a9fba in void std::__1::vector<silkworm::stagedsync::recovery::RecoveryPackage, std::__1::allocator<silkworm::stagedsync::recovery::RecoveryPackage> >::
__push_back_slow_path<silkworm::stagedsync::recovery::RecoveryPackage>(silkworm::stagedsync::recovery::RecoveryPackage&&) /usr/lib/llvm-14/bin/../include/c++/v1/vector:
1517:49
    #7 0x55c9a5493f82 in push_back /usr/lib/llvm-14/bin/../include/c++/v1/vector:1549:9
    #8 0x55c9a5493f82 in silkworm::stagedsync::recovery::RecoveryFarm::transform_and_fill_batch(unsigned long, std::__1::vector<silkworm::Transaction, std::__1::allocat
or<silkworm::Transaction> > const&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:314:16
    #9 0x55c9a548bcc6 in silkworm::stagedsync::recovery::RecoveryFarm::recover() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:117:34
    #10 0x55c9a54815a1 in silkworm::stagedsync::Senders::forward(silkworm::db::RWTxn&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders.cpp:33:27

Thread T17 created by T8 here:
    #0 0x55c9a4f3c3fc in __interceptor_pthread_create (/home/ipsilon/silkworm/build/asan/cmd/silkworm+0x1de3fc) (BuildId: a5bf77ccfd350aa48011e40f73f9cc35c0a5777e)
    #1 0x55c9a511341b in __libcpp_thread_create /usr/lib/llvm-14/bin/../include/c++/v1/__threading_support:375:10
    #2 0x55c9a511341b in thread<(lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45), void> /usr/lib/llvm-14/bin/../include/c++/v1/thread:309:1
6
    #3 0x55c9a511341b in make_unique<std::__1::thread, (lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> /usr/lib/llvm-14/bin/../include/c+
+/v1/__memory/unique_ptr.h:725:32
    #4 0x55c9a511341b in silkworm::Worker::start(bool) /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:15
    #5 0x55c9a54a51c1 in silkworm::stagedsync::recovery::RecoveryFarm::initialize_new_worker() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_fa
rm.cpp:390:26
    #6 0x55c9a5496651 in silkworm::stagedsync::recovery::RecoveryFarm::dispatch_batch() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:
355:17
    #7 0x55c9a54945b5 in silkworm::stagedsync::recovery::RecoveryFarm::transform_and_fill_batch(unsigned long, std::__1::vector<silkworm::Transaction, std::__1::allocat
or<silkworm::Transaction> > const&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:325:14
    #8 0x55c9a548bcc6 in silkworm::stagedsync::recovery::RecoveryFarm::recover() /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_farm.cpp:117:34
    #9 0x55c9a54815a1 in silkworm::stagedsync::Senders::forward(silkworm::db::RWTxn&) /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders.cpp:33:27

Thread T8 created by T0 here:
    #0 0x55c9a4f3c3fc in __interceptor_pthread_create (/home/ipsilon/silkworm/build/asan/cmd/silkworm+0x1de3fc) (BuildId: a5bf77ccfd350aa48011e40f73f9cc35c0a5777e)     
    #1 0x55c9a511341b in __libcpp_thread_create /usr/lib/llvm-14/bin/../include/c++/v1/__threading_support:375:10
    #2 0x55c9a511341b in thread<(lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45), void> /usr/lib/llvm-14/bin/../include/c++/v1/thread:309:1
6
    #3 0x55c9a511341b in make_unique<std::__1::thread, (lambda at /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:45)> /usr/lib/llvm-14/bin/../include/c+
+/v1/__memory/unique_ptr.h:725:32
    #4 0x55c9a511341b in silkworm::Worker::start(bool) /home/ipsilon/silkworm/node/silkworm/concurrency/worker.cpp:36:15
    #5 0x55c9a4f96252 in main /home/ipsilon/silkworm/cmd/silkworm.cpp:143:19
    #6 0x7f29af565d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-use-after-free /home/ipsilon/silkworm/node/silkworm/stagedsync/stage_senders/recovery_worker.cpp:61:38 in silkworm::stagedsync::recovery
::RecoveryWorker::work()
Shadow bytes around the buggy address:
  0x0f85ab261740: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab261750: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab261760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab261770: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab261780: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0f85ab261790:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab2617a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab2617b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab2617c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab2617d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0f85ab2617e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3478483==ABORTING

chfast avatar Oct 21 '22 08:10 chfast

Recently I have sporadically worked at refactoring the Senders stage to simplify the code and overcome the root cause of this problem (or at least prepare the ground for easier debugging).

@chfast can you please take a look if https://github.com/torquem-ch/silkworm/commit/b30d780121d07e88bfbc4ff1cd2694c911eabed1 solves the issue on your machine?

canepat avatar Dec 07 '22 11:12 canepat

Closed by b30d780

canepat avatar Jan 19 '23 12:01 canepat