hnsd icon indicating copy to clipboard operation
hnsd copied to clipboard

Heap corruption during authoritative resolution

Open chjj opened this issue 7 years ago • 3 comments

Spotted this today when clicking an obfuscated twitter link. The daemon had been running locally on my laptop for a few days.

rs: query
rs:   id=27745
rs:   labels=2
rs:   name=t.co.
rs:   type=1
rs:   class=1
rs:   edns=0
rs:   dnssec=0
rs:   tld=co
rs:   addr=127.0.0.1:51557
rs: udp nodata
ns: query
ns:   id=14389
ns:   labels=2
ns:   name=t.co.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=co
ns:   addr=127.0.0.1:52340
ns: udp nodata
corrupted double-linked list (not small)
Aborted

I've been unable to reproduce it, so no chance of using valgrind to track this down. The ns: udp nodata log implies that hsk_ns_onrecv() successfully executed otherwise I would suspect this of being an issue with the authoritative cache that was just added. It's also not sending a message in response, so there's no cache hit there.

My best guess is something funky happened in the P2P pool which corrupted the heap. I'm starting to add more debug logs so we can narrow this down when it happens again.

chjj avatar Apr 14 '18 08:04 chjj

Update: Finally a lead after so many months of head scratching...

When implementing the unbound node module, I noticed a similar heap corruption in node.js. It only presented itself when the unbound context was set to async mode. It was consistent and reproducible (it seemed to cause unrecoverable memory corruption maybe 1 out of every 20 times the bns test suite was run). I'm going to guess the same issue is affecting hnsd.

I hesitate to call this a bug in libunbound. It's possible that libuv and libevent don't play well with each other for some reason (?).

I think the solution for now would be to call unbound's resolver synchronously in the uv thread pool (the same fix used in the unbound node module).

We can leave this open to investigate the causes of this more thoroughly in the future.

chjj avatar Feb 07 '19 06:02 chjj

This has come up again in a branch where hnsd can discover peers and open more connections: https://github.com/handshake-org/hnsd/pull/38#issuecomment-681883190

pinheadmz avatar Sep 01 '20 15:09 pinheadmz

Got a stack trace of this:

peer 714 (64.227.15.172:12038): sending verack                                                                                                                                                    [25/1965]
peer 714 (64.227.15.172:12038): sending sendheaders                                                                                                                                                        
peer 714 (64.227.15.172:12038): sending getaddr                                                                                                                                                            
peer 714 (64.227.15.172:12038): sending getheaders                                                                                                                                                         
corrupted double-linked list (not small)                                                                                                                                                                   
                                                                                                                                                                                                           
Thread 1 "hnsd" received signal SIGABRT, Aborted.                                                                                                                                                          
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                                                      
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.                                                                                                                                     
(gdb) bt                                                                                                                                                                                                   
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50                                                                                                                                  
#1  0x00007ffff7cd5859 in __GI_abort () at abort.c:79                                                                                                                                                      
#2  0x00007ffff7d403ee in __libc_message (action=action@entry=do_abort,                                                                                                                                    
    fmt=fmt@entry=0x7ffff7e6a285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155                                                                                                                              
#3  0x00007ffff7d4847c in malloc_printerr (                                                                                                                                                                
    str=str@entry=0x7ffff7e6c248 "corrupted double-linked list (not small)") at malloc.c:5347                                                                                                              
#4  0x00007ffff7d48af7 in unlink_chunk (p=p@entry=0x5555563441c0, av=0x7ffff7e9bb80 <main_arena>)                                                                                                          
    at malloc.c:1468                                                                                                                                                                                       
#5  0x00007ffff7d4b773 in _int_malloc (av=av@entry=0x7ffff7e9bb80 <main_arena>,                                                                                                                            
    bytes=bytes@entry=8224) at malloc.c:4041                                                                                                                                                               
#6  0x00007ffff7d4d419 in __GI___libc_malloc (bytes=8224) at malloc.c:3066                                                                                                                                 
#7  0x00005555555732a0 in hsk_dns_msg_alloc () at src/dns.c:71                                                                                                                                             
#8  0x000055555557618c in hsk_dns_msg_decode (data=<optimized out>,                                                                                                                                        
    data@entry=0x5555557812f1 "\244\306\001 ", data_len=<optimized out>, data_len@entry=50,                                                                                                                
    msg=msg@entry=0x7fffffffa6c0) at src/dns.c:86                                                                                                                                                          
#9  0x0000555555582bbb in hsk_dns_req_create (data=0x5555557812f1 "\244\306\001 ", data_len=50,                                                                                                            
    addr=0x7fffffffa7a0) at src/req.c:72
#10 0x000055555556ee78 in after_recv ()
#11 0x00005555555b13c5 in uv__udp_recvmsg (handle=0x5555559e48f0) at src/unix/udp.c:205
#12 uv__udp_io (loop=<optimized out>, w=0x5555559e4970, revents=1) at src/unix/udp.c:142
#13 0x00005555555b3238 in uv__io_poll (loop=loop@entry=0x55555563f420 <default_loop_struct>, 
    timeout=2979) at src/unix/linux-core.c:400
#14 0x00005555555a855c in uv_run (loop=0x55555563f420 <default_loop_struct>, mode=UV_RUN_DEFAULT)
    at src/unix/core.c:368
#15 0x000055555556bef0 in main () at src/unix/core.c:820

pinheadmz avatar Sep 02 '20 23:09 pinheadmz