unicorn Execution depends on whether UC_HOOK_MEM_READ/

Using the dev branch head, I am facing an issue where I see that an execution flow that has been recorded in the past differs from it's recorded value. This binary snipplet includes self modifying code. I can get the old behavior only if I register an empty UC_HOOK_MEM_READ or WRITE hook (parameters do not matter, they can be zero). I checked the code and it seems branches regarding the existance of these hooks have been added in https://github.com/unicorn-engine/unicorn/commit/8816883bb326fd2839cf5c9c43590f4d265afcb6 and https://github.com/unicorn-engine/unicorn/commit/6cc7e1d431078c1c4b9b6ff6157e2a91c0dd5477. And all these seem to be in connection with https://github.com/unicorn-engine/unicorn/commit/d01035767e47f69b4e545d1843dbaf08e6a74752. (I made a check: If I remove the appropriate calls to tlb_set_dirty(), it works ok without hooks.) I do not really understand the connection between tlb_set_dirty() and read/write hooks. Unfortunately I do not have an example that easily fits here.

Oct 22 '24 17:10 boborjan2

I assume this is #2031 , could you have a check?

Oct 23 '24 01:10 wtdcode

Unfortunately no. In my case, #2031 fails (differs) even with mem hooks registered. Also, it min 2 times slower. I gonna try to figure out when (i.e which commit) it went wrong. It did work in september.

Oct 24 '24 13:10 boborjan2

It did work on 379791ad566be6cec7a0ebcbfd2098267a4cc471 (end of day 4th Sept). With and without mem hooks. However if I cherry pick d01035767e47f69b4e545d1843dbaf08e6a74752 on top, I get the same behavior where execution depends on mem hooks being registered. Sadly, this very commit is the one responsible for the speed improvement as well.

Oct 24 '24 13:10 boborjan2

Can you provide a sample?

Oct 24 '24 16:10 PhilippTakacs

It is not trivial to extract a shellcode size sample. What I see with #2031 is that the phenomenon is consistent with a correct execution flow with missing write mem hook. (i.e it is different from what I saw without #2031: there I experienced a change in execution flow when I enabled an empty read or write hook. Here the execution flow is correct with or without hook registered but when it is registered, some callbacks are missing) This is my current understanding, I keep looking.

Oct 24 '24 19:10 boborjan2

Can you check what happend when you apply following patch:

diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c
index 211d0ec1..3de68e67 100644
--- a/qemu/accel/tcg/cputlb.c
+++ b/qemu/accel/tcg/cputlb.c
@@ -1188,23 +1188,23 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
 
 static bool uc_mem_hook_installed(struct uc_struct *uc, target_ulong paddr)
 {
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_AFTER, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_AFTER))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_PROT))
         return true;
     return false;
 }

Also, it min 2 times slower

There is not much I can do about this. The problem is by using tlb_set_dirty most of the unicorn code will be skipped for mem access. This allows faster execution but prevents self modifying code and memory hooks.

Oct 25 '24 08:10 PhilippTakacs

What happens to self modifying code? Does the optimization break it?

From: PhilippTakacs @.> Date: Friday, 25 October 2024 at 4:39 PM To: unicorn-engine/unicorn @.> Cc: lazymio @.>, Comment @.> Subject: Re: [unicorn-engine/unicorn] Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered (Issue #2041)

Can you check what happend when you apply following patch:

diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c

index 211d0ec1..3de68e67 100644

--- a/qemu/accel/tcg/cputlb.c

+++ b/qemu/accel/tcg/cputlb.c

@@ -1188,23 +1188,23 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)

static bool uc_mem_hook_installed(struct uc_struct *uc, target_ulong paddr)

{

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_UNMAPPED, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_UNMAPPED))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_UNMAPPED, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_UNMAPPED))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_PROT, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_PROT))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_PROT, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_PROT))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_AFTER, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_AFTER))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_UNMAPPED, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_UNMAPPED))
```
return true;
```

if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_PROT, paddr))

if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_PROT))
```
return true;
```
return false;

}

Also, it min 2 times slower

There is not much I can do about this. The problem is by using tlb_set_dirty most of the unicorn code will be skipped for mem access. This allows faster execution but prevents self modifying code and memory hooks.

— Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/2041#issuecomment-2437221585, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO26TSEQSI4AD2MKGODZ5H7UJAVCNFSM6AAAAABQM7NDKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZXGIZDCNJYGU. You are receiving this because you commented.Message ID: @.***>

Oct 25 '24 08:10 wtdcode

What happens to self modifying code? Does the optimization break it?

As far as I see yes. I assume the problem is when one tb writes two times itself for the first one the tb-cache is cleared and not dirty is set, but for the second write the fast path is taken so no rebuild of the tb. I'm currently still at debugging this to understand what exactly happen and to enable the optimization whenever possible.

Oct 25 '24 08:10 PhilippTakacs

Without #2031: in d01035767e47f69b4e545d1843dbaf08e6a74752, it is indeed tlb_set_dirty() that causes the bad behavior in notdirty_write(). Previously it had been never called (cpu_physical_memory_is_clean() always evaluated to true), after the patch it is always called. After the patch, the code in i386/tcg-target.inc, in tcg_out_tlb_load(), makes the difference:

    // Unicorn: fast path if hookmem is not enable
    if (!HOOK_EXISTS(s->uc, UC_HOOK_MEM_READ) && !HOOK_EXISTS(s->uc, UC_HOOK_MEM_WRITE))
        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
    else
        /* slow_path, so data access will go via load_helper() */
        tcg_out_opc(s, OPC_JMP_long, 0, 0, 0);

Taking the slow path always gets rid of the dependency of mem hooks being registered.

With #2031: in my case, execution does not touch to uc_mem_hook_installed() (verified via debugger), so any patch there won't help. I am not using snaphots. Debugging shows that tlbe->addr_code never gets equal to -1, so uc_mem_hook_installed() is never called. This means that tlb_set_dirty() is never called. This is similar to the behavior before d01035767e47f69b4e545d1843dbaf08e6a74752 (no dependence on mem hooks) with the difference that some mem hooks are missed. Thanks for the quick response btw.

Oct 25 '24 09:10 boborjan2

Can you install your memhook in a version before https://github.com/unicorn-engine/unicorn/commit/d01035767e47f69b4e545d1843dbaf08e6a74752 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

Oct 25 '24 09:10 PhilippTakacs

It seems that the mem callbacks are missing because uc->size_recur_mem is not zero in store_helper(). The stores do happen, as i said, execution flow seems ok.

Oct 25 '24 09:10 boborjan2

Can you install your memhook in a version before d010357 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

before d010357 , mem hooks do happen and executing does not depend on hooks being installed. I made this check under 1) above.

so

before d01035767e47f69b4e545d1843dbaf08e6a74752: hooks do happen and are ok
after d01035767e47f69b4e545d1843dbaf08e6a74752: hooks do happen but execution depends on whether I install mem hooks
after #2031 : execution does not depend on mem hooks but if i register them, some are missing because in store_helper(), uc->size_recur_mem is not zero

Oct 25 '24 09:10 boborjan2

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

Oct 25 '24 10:10 PhilippTakacs

uc->size_recur_mem is a pretty old hack when prototyping Unicorn2. I failed to understand why it was there a few months ago, too.

Oct 25 '24 10:10 wtdcode

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

Oct 25 '24 10:10 boborjan2

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

Cool, this seems the root cause. I will check if this hack is really necessary.

Btw, does this change fail any unit test?

Oct 25 '24 11:10 wtdcode

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

Oct 25 '24 12:10 PhilippTakacs

it is "cpu->uc->size_recur_mem = 0;" in fact

Yes, my bad.

Oct 25 '24 12:10 PhilippTakacs

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

Thanks. Please go ahead.

Oct 25 '24 12:10 wtdcode

So now I could reproduce the missing hooks, but I have now double hooks for each write which modifies the current tb. This is the asm:

mov QWORD PTR [rip+0x29], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
nop
nop
nop
mov QWORD PTR [rip-0x08], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
hlt

I have debugged this a bit and found that the cpu_loop_exec_tb() is called with the old value of rip. For example after the second mov the execution is restarted with rip as 0x1007 (code starts at 0x1000). This causes the self modifying code to be executed twice. Is this expected behavior?

Oct 28 '24 12:10 PhilippTakacs

I confirm I experience two hooks per hit too. I wonder if it's connected to d01035767e47f69b4e545d1843dbaf08e6a74752 as well.

Oct 28 '24 13:10 boborjan2

No, the double execution (with smc) also happens before d010357.

Oct 29 '24 10:10 PhilippTakacs