gramine icon indicating copy to clipboard operation
gramine copied to clipboard

[LibOS] Race condition triggers ASan use‑after‑poison in `execve` path (` release_clear_child_tid `)

Open forkthus opened this issue 5 months ago • 0 comments

Description of the problem

exec_same fails on the Jenkins-SGX-24.04-Sanitizers job for PR #1795 with an ASan use‑after‑poison inside release_clear_child_tid(). Reproduces on main, so this is not PR‑specific.

[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: use-after-poison (unallocated SGX memory?) while trying to store 4 bytes at 0x84c0990
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: the bad address is 0x84c0990 (0 from beginning of access)
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: location: release_clear_child_tid at libos_futex.c, libsysdb.so+0x49ab05 (addr = 0xeb25b05)
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: (for a full traceback, use GDB with a breakpoint at "libos_abort")
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: shadow bytes around the bad address:
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x180010980f0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098100: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098110: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098120: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: =>0x18001098130: f7 f7[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
...

Root Cause:

  1. thread_exit() enqueues a cleanup_thread task on the async‑worker queue; that task writes *clear_child_tid.
  2. In the libos_syscall_execve() path, the VMA of the thread’s TCB is freed before the async worker gets to run:https://github.com/gramineproject/gramine/blob/ff71d7afea730dffd56a97af39bb6a73ee6c7662/libos/src/sys/libos_futex.c#L985
  3. When the worker eventually stores 0 to *clear_child_tid, it writes to memory that has already been freed.

Steps to reproduce

  1. Build Gramine with SGX, ASAN, and UBSAN.
  2. Run exec_same test with args [arg_#1...arg_#49]

Expected results

The async‑worker thread should zero each exiting thread’s *clear_child_tid before that thread’s VMA is freed.

Actual results

libos_syscall_execve() frees the thread’s VMA first, and the async worker attempts to write to *clear_child_tid afterwards, resulting in a use‑after‑poison.

Gramine commit hash

f0f71bef451fb839543f33bc388ce20ad9bb50eb / ff71d7afea730dffd56a97af39bb6a73ee6c7662

forkthus avatar Jul 30 '25 19:07 forkthus