LibAFL SIGSEGV when using asan in aarch64 qemu mode

IMPORTANT

You have verified that the issue to be present in the current main branch Yes

$ git log | head -n 1
commit 453d733a3562dcea290265dafec1908832f97658

Describe the bug I first encountered this issue when reproducing the result of android fuzzer in libafl_qemu_artifact. When I added --features asan to the building process of the fuzzer, it crashed and the log showed:

qemu: QEMU internal SIGSEGV {code=MAPERR, addr=0x1555d554de02}
Segmentation fault(core dumped)

I debugged this issue thoroughly and carefully using gdb-multiarch, and found that it is caused by a dereference failure of shadow memory address:

   0x5555557307b5 <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+341>    lea    rax, [rip + 0x8ac3e4]              RAX => 0x555555fdcba0 (guest_base) ◂— 0
   0x5555557307bc <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+348>    mov    rcx, qword ptr [rax]               RCX, [guest_base] => 0
   0x5555557307bf <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+351>    xor    eax, eax                           EAX => 0
   0x5555557307c1 <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+353>    nop    word ptr cs:[rax + rax]
   0x5555557307d0 <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+368>    lea    rdx, [rcx + rbx]                   RDX => 0xaaaaaaaaf010 ◂— 0
   0x5555557307d4 <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+372>    sar    rdx, 3
  ►  0x5555557307d8 <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+376>    mov    byte ptr [rdx + 0x7fff8000], 0     <Cannot dereference [0x1555d554de02]>
   0x5555557307df <libafl_qemu::modules::usermode::asan::AsanGiovese::fake_syscall+383>    add    rbx, 8

This is in function "libafl_qemu::modules::usermode::asan::AsanGiovese::unposion", which is in libafl_qemu/src/modules/usermode/asan.rs:

pub fn unpoison(qemu: Qemu, addr: GuestAddr, n: usize) -> bool {
        unsafe {
            let n = n as isize;
            let mut start = addr;
            let end = start.wrapping_add(n as GuestAddr);

            while start < end {
                let h = qemu.g2h::<*const c_void>(start) as isize;
                let shadow_addr = ((h >> 3) as *mut i8).offset(SHADOW_OFFSET);
  ►            *shadow_addr = 0;
                start = (start).wrapping_add(8);
            }
            true
        }
    }

In my case, the original start addr is 0xaaaaaaaaf010，n is 0x158，end addr is 0xaaaaaaaaf168. When it execute (h >> 3), 0xaaaaaaaaf010 becomes 0x155555555e02. The SHADOW_OFFSET is 0x7fff8000, so shadow_addr is 0x1555d554de02. Both 0x155555555e02 and 0x1555d554de02 is not addressable:

pwndbg>x/x 0x155555555e02
0x155555555e02:
Cannot access memory at address 0x155555555e02

This happens in libafl-0.11.2, and I also tried 0.13.2, it still exists. -------------------------------------8<---------------------------------- I saw this similar issue 2579 , so I tried the example fuzzer qemu_launcher in the latest main version (as I said in the begining). In my case, the --features=x86_64, asan works well:

pwndbg> p/x end
$3= 0x7ffff5b004a8
pwndbg> p/x start
$4= 0x7ffff5b002a0
pwndbg> p n
$5 =<optimized out>
pwndbg> p/x end-start
$6 = 0x208
==============
0x7ffff5b002a0 >> 3 = 0xffffeb60054
==============
pwndbg> x/x 0xffffeb60054
0xffffeb60054:  0x00000000

The start addr is 0x7ffff5b004a8. After right shift it becomes 0xffffeb60054, and this addr is addressable.

But in --features=aarch64, asan, it crashes because of the same reason but in different code area:

pwndbg> set args "--input" "./corpus" "--output" "/home/LibAFL/fuzzers/binary_only/qemu_launcher/target/aarch64/output/" "--cores" "0-7" "--asan-cores" "0-3" "--cmplog-cores" "2-5" "--verbose" "--" "/home/LibAFL/fuzz
ers/binary_only/qemu_launcher/target/aarch64/libpng-harness-aarch64"
pwndbg> r
Starting program: /home/LibAFL/fuzzers/binary_only/qemu_launcher/target/aarch64/release/qemu_launcher-aarch64 "--input" "./corpus" "--output" "/home/LibAFL/fuzzers/binary_only/qemu_launcher/target/aarch64/output/" "--cores" "0-7" "--asan-cores" "0-3" "--cmplog-cores" "2-5" "--verbose" "--" "/home/LibAFL/fuzzers/binary_only/qemu_launcher/target/aarch64/libpng-harness-aarch64"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7800640 (LWP 3793330)]

Thread 1 "qemu_launcher-a" received signal SIGSEGV, Segmentation fault.
libafl_qemu::modules::usermode::asan::AsanModule::read_8 (self=0x55555923c428, pc=<optimized out>, addr=<optimized out>) at /home/LibAFL/libafl_qemu/src/modules/usermode/asan.rs:868
868             if self.enabled() && AsanGiovese::is_invalid_access_8(qemu, addr) {
Warning: the current language does not match this frame.
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
───────────────────────────────[ REGISTERS / show-flags off / show-compact-regs off ]───────────────────────────────
 RAX  0x55555923c400 ◂— 0
 RBX  0xaaaaaab0ff60 —▸ 0x7ffff79fdb58 ◂— 0xadb771622a56ae00
 RCX  0xaaaaaab0ff60 —▸ 0x7ffff79fdb58 ◂— 0xadb771622a56ae00
 RDX  0xaaaaaaaa1788 ◂— 0xf9400001f947b000
 RDI  0x55555923ded0 ◂— 0
 RSI  0x155555561fec
 R8   0xaaaaaab0ff60 —▸ 0x7ffff79fdb58 ◂— 0xadb771622a56ae00
 R9   0x55555564efb0 (libafl_qemu::modules::usermode::asan::trace_read8_asan) ◂— mov rax, qword ptr [rdi + 0x120]
 R10  0
 R11  0xffffedbffcd ◂— 0
 R12  0x7ffff79fdb58 ◂— 0xadb771622a56ae00
 R13  0xaaaaaaaa1fa0 ◂— 0x2a0003e152800000
 R14  0x7fffe8000100 (code_gen_buffer+211) ◂— mov ebx, dword ptr [rbp - 0x10] /* 0xce8c0fdb85f05d8b */
 R15  0x7fffe8000040 (code_gen_buffer+19) —▸ 0xaaaaaaaa176c ◂— 0xa9017bfdd10303ff
 RBP  0x5555591df880 ◂— 0
 RSP  0x7fffffffa3b8 —▸ 0x7fffe8000273 (code_gen_buffer+582) ◂— mov rbx, qword ptr [rbp + 0x40] /* 0x49e38b4c405d8b48 */
 RIP  0x55555564efce (libafl_qemu::modules::usermode::asan::trace_read8_asan+30) ◂— cmp byte ptr [rsi + 0x7fff8000], 0
────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]────────────────────────────────────────
 ► 0x55555564efce <libafl_qemu::modules::usermode::asan::trace_read8_asan+30>    cmp    byte ptr [rsi + 0x7fff8000], 0
   0x55555564efd5 <libafl_qemu::modules::usermode::asan::trace_read8_asan+37>    je     libafl_qemu::modules::usermode::asan::trace_read8_asan+90 <libafl_qemu::modules::usermode::asan::trace_read8_asan+90>
 
   0x55555564efd7 <libafl_qemu::modules::usermode::asan::trace_read8_asan+39>    sub    rsp, 0x28
   0x55555564efdb <libafl_qemu::modules::usermode::asan::trace_read8_asan+43>    mov    rdi, qword ptr [rax + 0x48]
   0x55555564efdf <libafl_qemu::modules::usermode::asan::trace_read8_asan+47>    mov    qword ptr [rsp + 0x10], rcx
   0x55555564efe4 <libafl_qemu::modules::usermode::asan::trace_read8_asan+52>    mov    qword ptr [rsp + 0x18], 8
   0x55555564efed <libafl_qemu::modules::usermode::asan::trace_read8_asan+61>    mov    qword ptr [rsp + 8], 2
   0x55555564eff6 <libafl_qemu::modules::usermode::asan::trace_read8_asan+70>    lea    rax, [rsp + 8]
   0x55555564effb <libafl_qemu::modules::usermode::asan::trace_read8_asan+75>    mov    rsi, rdx
   0x55555564effe <libafl_qemu::modules::usermode::asan::trace_read8_asan+78>    mov    rdx, rax
   0x55555564f001 <libafl_qemu::modules::usermode::asan::trace_read8_asan+81>    call   libafl_qemu::modules::usermode::asan::AsanGiovese::report_or_crash <libafl_qemu::modules::usermode::asan::AsanGiovese::report_or_crash>
─────────────────────────────────────────────────[ SOURCE (CODE) ]──────────────────────────────────────────────────
In file: /home/LibAFL/libafl_qemu/src/modules/usermode/asan.rs:868
   863             self.rt.report_or_crash(qemu, pc, AsanError::Read(addr, 4));
   864         }
   865     }
   866 
   867     pub fn read_8(&mut self, qemu: Qemu, pc: GuestAddr, addr: GuestAddr) {
 ► 868         if self.enabled() && AsanGiovese::is_invalid_access_8(qemu, addr) {
   869             self.rt.report_or_crash(qemu, pc, AsanError::Read(addr, 8));
   870         }
   871     }
   872 
   873     pub fn read_n(&mut self, qemu: Qemu, pc: GuestAddr, addr: GuestAddr, size: usize) {
─────────────────────────────────────────────────────[ STACK ]──────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffa3b8 —▸ 0x7fffe8000273 (code_gen_buffer+582) ◂— mov rbx, qword ptr [rbp + 0x40] /* 0x49e38b4c405d8b48 */
01:0008│     0x7fffffffa3c0 —▸ 0x5555591b9218 (tcg_init_ctx+2008) —▸ 0x6201010203 ◂— 0
02:0010│     0x7fffffffa3c8 ◂— 0
03:0018│     0x7fffffffa3d0 —▸ 0x5555591ba210 (tcg_init_ctx+6096) —▸ 0x800101000c ◂— 0
04:0020│     0x7fffffffa3d8 ◂— 0x1530
05:0028│     0x7fffffffa3e0 ◂— 0x5030 /* '0P' */
06:0030│     0x7fffffffa3e8 ◂— 0
07:0038│     0x7fffffffa3f0 ◂— 7
───────────────────────────────────────────────────[ BACKTRACE ]────────────────────────────────────────────────────
 ► 0   0x55555564efce libafl_qemu::modules::usermode::asan::trace_read8_asan+30
   1   0x55555564efce libafl_qemu::modules::usermode::asan::trace_read8_asan+30
   2   0x7fffe8000273 code_gen_buffer+582
   3   0x555555b92470 cpu_tb_exec+80
   4   0x555555b93055 cpu_exec_loop.constprop+805
   5   0x555555b93055 cpu_exec_loop.constprop+805
   6   0x555555b93639 cpu_exec_setjmp.isra+41
   7   0x555555b936cb cpu_exec+107
───────────────────────────────────────────────[ THREADS (2 TOTAL) ]────────────────────────────────────────────────
  ► 1   "qemu_launcher-a" stopped: 0x55555564efce <libafl_qemu::modules::usermode::asan::trace_read8_asan+30> 
    2   "qemu_launcher-a" stopped: 0x7ffff7b1e88d <syscall+29> 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Here, it crashes at accessing [rsi + 0x7fff8000], which looks the same as the issue before.

pwndbg> x/x (0x155555561fec+0x7fff8000)
0x1555d5559fec: Cannot access memory at address 0x1555d5559fec

=====================================================
   0x00005555556445fd <+13>:    lea    rsi,[rip+0x2b44bc4]        # 0x5555581891c8 <guest_base>
   0x0000555555644604 <+20>:    mov    rsi,QWORD PTR [rsi]
   0x0000555555644607 <+23>:    add    rsi,rcx
   0x000055555564460a <+26>:    sar    rsi,0x3
=> 0x000055555564460e <+30>:    cmp    BYTE PTR [rsi+0x7fff8000],0x0

I am new to qasan, so now I am trying to figure out why this happened. Can you offer some help to this issue? Thank you very much!

To Reproduce

Steps to reproduce the android fuzzer behavior:

I do totally the same as the instruction in libafl_qemu_artifact.

Steps to reproduce the qemu_launcher behavior:

git clone https://github.com/AFLplusplus/LibAFL.git
cd LibAFL/fuzzers/qemu/qemu_launcher
export LLVM_CONFIG="llvm-config-15"
export QEMU_LD_PREFIX=/path/to/aarch64-linux-gnu/
cargo make aarch64

I modified the Makefile.toml to add the feature simplemgr in the case of clarity.

Steps to debug qemu_launcher:

gdb-multiarch target/aarch64/release/qemu_launcher-aarch64
pwndbg> set args --input ./corpus/ --output target/aarch64/output/ --cores 0-1 --asan-cores 0 --cmplog-cores 1 -- target/aarch64/libpng-harness-aarch64

My environment info:

lsb_release -a && \
    arch && \
    llvm-config --version && \
    rustup toolchain list && \
    rustc -V
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy
x86_64
14.0.0 (I export LLVM_CONFIG=llvm-config-15 when building the fuzzer's project)
stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu
rustc 1.80.1 (3f5fd8dd4 2024-08-06)

Expected behavior The fuzzer works well as in x86_64 architecture.

Oct 15 '24 09:10 RongxiYe

About android fuzzer, I found a strange thing: The address of my harnessDecode is harnessDecode @ 0xaaaaaaaabb70. Another developer was able to successfully run the android fuzzer with asan, and his harnessDecode address started with 0x7fff: harnessDecode @ 0x7ffff7fb4b68. I think this might be the key to the matter: because in the case of x86_64 qemu_launcher the addresses are laid out similarly to this.

Using pmap:

# his process space
 pmap 89530 |grep harness
00007ffff7fb3000      4K r---- harness
00007ffff7fb4000      8K r---- harness
00007ffff7fb6000      4K r---- harness
00007ffff7fb7000      4K rw--- harness
=============================
# my process space
pmap 1809467 |grep harness
0000aaaaaaaab000      4K r---- harness
0000aaaaaaaad000      4K r---- harness

I am very confused about this...

Oct 15 '24 10:10 RongxiYe

thank you for the detailed report. i just saw you closed the issue, is it because your problem is solved?

Oct 15 '24 11:10 rmalmain

thank you for the detailed report. i just saw you closed the issue, is it because your problem is solved?

No, I haven't solved it totally. I closed the issue because my colleague was able to run the android fuzzer with aarch architecture and asan. So I guess that my SIGSEGV happens due to some of my wrong settings, not a bug in the project. But I'd be grateful if you can help.

At now, I only noticed the address mapping of the harness is different between my colleague's and mine:

# his process space
 pmap 89530 |grep harness
00007ffff7fb3000      4K r---- harness
00007ffff7fb4000      8K r---- harness
00007ffff7fb6000      4K r---- harness
00007ffff7fb7000      4K rw--- harness
=============================
# my process space
pmap 1809467 |grep harness
0000aaaaaaaab000      4K r---- harness
0000aaaaaaaad000      4K r---- harness

Oct 15 '24 11:10 RongxiYe

Sorry for closing and opening the issue again, as this is my first time submitting an issue :)

Oct 15 '24 11:10 RongxiYe

I debugged further and found that in my colleague's machine, mmap syscall returns an address starting with 0x7fff, but mine returns 0xaaaaaaaab000. The allocate request are the same: both are 0xaaaaaaaab000, because this is ELF_ET_DYN_BASE defined in elf.h.

__GI___mmap64 (addr=addr@entry=0xaaaaaaaab000, len=len@entry=16384, prot=prot@entry=0, flags=flags@entry=16418, fd=fd@entry=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:47
---------------------------------------8<-----------------------------------
__GI___mmap64 (addr=0xaaaaaaaab000, len=len@entry=20480, prot=prot@entry=0, flags=flags@entry=16418, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/mmap64.c:47

This mmap happens in init_qemu_with_asan:

#0  mmap_h_eq_g (offset=<optimized out>, fd=-1, page_flags=8, flags=16418, host_prot=0, len=16384, start=187649984475136) at ../linux-user/mmap.c:566
#1  target_mmap__locked (offset=<optimized out>, fd=-1, page_flags=8, flags=16418, target_prot=0, len=16384, start=187649984475136) at ../linux-user/mmap.c:894
#2  target_mmap (start=<optimized out>, len=16384, len@entry=12296, target_prot=target_prot@entry=0, flags=16418, fd=fd@entry=-1, offset=offset@entry=0) at ../linux-user/mmap.c:949
#3  0x0000555555a084a0 in load_elf_image (image_name=0x555555fdcbc0 <real_exec_path> "libafl_qemu_artifacts/android_fuzzer/harness", src=src@entry=0x555555fdca20 <bprm+1024>, info=info@entry=0x555555fdca80 <libafl_image_info>, ehdr=ehdr@entry=0x7fffffffca90, pinterp_name=pinterp_name@entry=0x7fffffffc850) at ../linux-user/elfload.c:3412
#4  0x0000555555a08e64 in load_elf_binary (bprm=bprm@entry=0x555555fdc620 <bprm>, info=info@entry=0x555555fdca80 <libafl_image_info>) at ../linux-user/elfload.c:3868
#5  0x0000555555a0b3ab in loader_exec (fdexec=fdexec@entry=3, filename=<optimized out>, argv=argv@entry=0x555555ffa9b0, envp=envp@entry=0x55555605c860, regs=regs@entry=0x7fffffffcca0, infop=infop@entry=0x555555fdca80 <libafl_image_info>, bprm=<optimized out>) at ../linux-user/linuxload.c:163
#6  0x0000555555a0c877 in qemu_user_init (argc=6, argv=0x555555ff2ca0, envp=<optimized out>) at ../linux-user/main.c:1007
#7  0x000055555562a8dd in libafl_qemu::qemu::Qemu::init (args=..., env=...) at /src/qemu/mod.rs:557
#8  libafl_qemu::modules::usermode::asan::init_qemu_with_asan (args=0x7fffffffd1e0, env=...) at /src/modules/usermode/asan.rs:719
#9  android_fuzzer::main () at src/main.rs:236

I do know that mmap may behave differently on different systems, while I know little about the details. However, what happened on my machine shows that the address 0xaaaaaaaab000 can be allocated successfully. In this case, qasan's unpoison algorithm does not seem to work, because the address after right shift cannot be dereferenced and accessed.

Oct 16 '24 04:10 RongxiYe

I tried this code in two machines, and I got different results.

#include <stdio.h>
#include <sys/mman.h>
int main(void){
	void* ptr = NULL;
	ptr = mmap(0xaaaaaaaab000, 16384, 0, 16418, -1, 0);
	printf("%p\n", ptr);
}

On my server which is used to run android fuzzer previously, it prints 0xaaaaaaaab000. On another one, it prints 0x7f165e17b000. I think this is the key reason to this issue.

Do you think this case will be taken into account by the libafl implementation? If not, I will close this issue.

Oct 16 '24 07:10 RongxiYe

it makes sense to me that you get the segfault at least since shadow memory is designed to work with memory in the [0x10007fff8000, 0x7fffffffffff] range (for high addresses), and you get mapped above the max address. not sure exactly why you get mapped so high in memory compared to others, your environment looks pretty standard. we can fix it by adding another memory range, but ideally we should determine why it happens imho.

Oct 16 '24 12:10 rmalmain

Yes, I have tried to figure out why my server is able to successfully map such a high address, but I haven't figured it out yet. I have tried 3 machines, and only this larger one exhibits this behavior. I am also asking the configuration person for this server. If I get some useful information, I will be happy to share it here as soon as possible.

Oct 16 '24 14:10 RongxiYe

ok thanks. i tried to check online for this address (0xaaaaaaaaa000) but nothing interesting so far.

Oct 21 '24 12:10 rmalmain