Segfault in tb_target_set_jmp_target_arm
In a very long emulation, I have a segfault (dereferencing null pointer) in qemu/tcg/aarch64/tcg-target.inc.c:tb_target_set_jmp_target.
In qemu/accel/tcg/cpu-exec.c, the function tb_set_jmp_target calls the previous function tb_target_set_jmp_target with parameters tc_ptr = 0 and tc_ptr + offset = 0.
Then tb_target_set_jmp_target calls atomic_set((uint64_t*)jmp_addr_, pair) with jmp_addr = tc_ptr + offset = 0, which segfaults.
I patched it (dirty) by adding the following code in tb_set_jmp_target: if (TCG_TARGET_HAS_direct_jump && tb->tc.ptr) { […] }
I do not know what this patch may break, but it solves the crash. I am on commit 6ae0c97.
Feel free to ask me anything.
Thanks for the report and investigation. Your version seems pretty old, how about current dev branch?
I have also encountered the same problem, and this issue has been very helpful to me
Unfortunately, I cannot try on a newer unicorn version, sorry. I am using unicorn as part of unicornafl++ and it uses the version I pointed out.
unicornafl++
What is that? Are you mentioning unicornafl?
Yes, the one that is part of afl++, that you may find here: https://github.com/AFLplusplus/unicornafl
Yes, the one that is part of afl++, that you may find here: https://github.com/AFLplusplus/unicornafl
Okay, unicornafl will bump to 2.1.2 once I fixed it.
That would be great! Thanks.
Tagging https://github.com/unicorn-engine/unicorn/issues/1923 for visibility. My temporary workaround is to destroy and reconstruct the emulator once every few million times.
I assume https://github.com/unicorn-engine/unicorn/commit/2c688bae734b1e63b2e2fd5d715c1070af722651 shall solve this. Could you have a try @futhewo ?
Hi,
Experienced same segfault on aarch64 target. After investigation the
null tc.ptr comes from the last_tb in tb_find.
Analysis
The TCG highwater has been lowered.
void tcg_prologue_init(TCGContext *s)
{
...
s->code_gen_highwater = (char *)s->code_gen_buffer + (total_size - TCG_HIGHWATER) - s->uc->qemu_real_host_page_size;
}
Thus tcg_gen_code called from tb_find will fail tb_alloc because
of a lower highwater. However the last_tb is still set within
tb_find.
static inline TranslationBlock *tb_find(CPUState *cpu,
TranslationBlock *last_tb,
int tb_exit, uint32_t cf_mask)
{
...
tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
}
Back from tb_gen_code in tb_find, the call to
tb_add_jump/tb_set_jmp_target will trigger the fault:
static inline TranslationBlock *tb_find(CPUState *cpu,
TranslationBlock *last_tb,
int tb_exit, uint32_t cf_mask)
{
...
if (last_tb) {
tb_add_jump(last_tb, tb_exit, tb);
tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr);
SEGV
}
}
While my baseline is d568885d64c89db5b9a722f0c1bef05aa92f84ca.
I'm using Sai Cao patch:
git show d01904d3f33bd58c54271a7e28c402dbaddda847
commit d01904d3f33bd58c54271a7e28c402dbaddda847
Author: Sai Cao <[email protected]>
Date: Thu Dec 19 11:29:20 2024 +0800
Fix bug that tcg_region_alloc tbs were not flush
this cause uc_ctl_remove_cache remove a invalid tbs.
diff --git a/qemu/tcg/tcg.c b/qemu/tcg/tcg.c
index dcacec7c..2b643b49 100644
--- a/qemu/tcg/tcg.c
+++ b/qemu/tcg/tcg.c
@@ -808,6 +808,7 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
if (tcg_region_alloc(s)) {
return NULL;
}
+ tb_flush(s->uc->cpu);
goto retry;
}
s->code_gen_ptr = next;
The tb_flush done during the failed alloc, because of a lower
highwater, is reponsible for invalidating the last_tb in tb_find
whithout letting it know about.
Candidate fix
We could eventually check last_tb.tc.ptr before calling
set_jmp_target. But it is safer to detect a tb_flush earlier and
reset last_tb such as:
diff --git a/qemu/accel/tcg/cpu-exec.c b/qemu/accel/tcg/cpu-exec.c
index fc47eecb..bf9a8c3b 100644
--- a/qemu/accel/tcg/cpu-exec.c
+++ b/qemu/accel/tcg/cpu-exec.c
@@ -259,11 +259,20 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
if (tb == NULL||tb->cflags&CF_NOCACHE) {
mmap_lock();
+
+ unsigned tb_flush_count = uc->tcg_ctx->tb_ctx.tb_flush_count;
+
tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
mmap_unlock();
/* We add the TB in the virtual pc hash table for the fast lookup */
cpu->tb_jmp_cache[tb_jmp_cache_hash_func(cpu->uc, pc)] = tb;
+ if (uc->tcg_ctx->tb_ctx.tb_flush_count != tb_flush_count) {
+ uc->last_tb = last_tb = NULL;
+ }
+
if (uc->last_tb) {
UC_TB_COPY(&cur_tb, tb);
UC_TB_COPY(&prev_tb, uc->last_tb);
Hi,
Experienced same segfault on aarch64 target. After investigation the null
tc.ptrcomes from thelast_tbintb_find.Analysis
The TCG highwater has been lowered.
void tcg_prologue_init(TCGContext *s) { ... s->code_gen_highwater = (char *)s->code_gen_buffer + (total_size - TCG_HIGHWATER) - s->uc->qemu_real_host_page_size; }Thus
tcg_gen_codecalled fromtb_findwill failtb_allocbecause of a lower highwater. However thelast_tbis still set withintb_find.static inline TranslationBlock *tb_find(CPUState *cpu, TranslationBlock *last_tb, int tb_exit, uint32_t cf_mask) { ... tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); }Back from
tb_gen_codeintb_find, the call totb_add_jump/tb_set_jmp_targetwill trigger the fault:static inline TranslationBlock *tb_find(CPUState *cpu, TranslationBlock *last_tb, int tb_exit, uint32_t cf_mask) { ... if (last_tb) { tb_add_jump(last_tb, tb_exit, tb); tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr); SEGV } }While my baseline is
d568885d64c89db5b9a722f0c1bef05aa92f84ca. I'm usingSai Caopatch:git show d01904d3f33bd58c54271a7e28c402dbaddda847 commit d01904d3f33bd58c54271a7e28c402dbaddda847 Author: Sai Cao <[email protected]> Date: Thu Dec 19 11:29:20 2024 +0800 Fix bug that tcg_region_alloc tbs were not flush this cause uc_ctl_remove_cache remove a invalid tbs. diff --git a/qemu/tcg/tcg.c b/qemu/tcg/tcg.c index dcacec7c..2b643b49 100644 --- a/qemu/tcg/tcg.c +++ b/qemu/tcg/tcg.c @@ -808,6 +808,7 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s) if (tcg_region_alloc(s)) { return NULL; } + tb_flush(s->uc->cpu); goto retry; } s->code_gen_ptr = next;The
tb_flushdone during the failed alloc, because of a lower highwater, is reponsible for invalidating thelast_tbintb_findwhithout letting it know about.Candidate fix
We could eventually check
last_tb.tc.ptrbefore callingset_jmp_target. But it is safer to detect atb_flushearlier and resetlast_tbsuch as:diff --git a/qemu/accel/tcg/cpu-exec.c b/qemu/accel/tcg/cpu-exec.c index fc47eecb..bf9a8c3b 100644 --- a/qemu/accel/tcg/cpu-exec.c +++ b/qemu/accel/tcg/cpu-exec.c @@ -259,11 +259,20 @@ static inline TranslationBlock *tb_find(CPUState *cpu, tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask); if (tb == NULL||tb->cflags&CF_NOCACHE) { mmap_lock(); + + unsigned tb_flush_count = uc->tcg_ctx->tb_ctx.tb_flush_count; + tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); mmap_unlock(); /* We add the TB in the virtual pc hash table for the fast lookup */ cpu->tb_jmp_cache[tb_jmp_cache_hash_func(cpu->uc, pc)] = tb; + if (uc->tcg_ctx->tb_ctx.tb_flush_count != tb_flush_count) { + uc->last_tb = last_tb = NULL; + } + if (uc->last_tb) { UC_TB_COPY(&cur_tb, tb); UC_TB_COPY(&prev_tb, uc->last_tb);
Could you try with commit 2128e01efc81404fecfcdfe1c7bb282ebd3e87d3? This includes a relevant fix to it.
This ensures we have correct tb_flush.
Also I would appreciate it if you could also have a reproduction, even though it might be complex, not minimal. @samuel-beckett