unicorn icon indicating copy to clipboard operation
unicorn copied to clipboard

regression: missing UC_HOOK_MEM

Open boborjan2 opened this issue 11 months ago • 20 comments

It seems that after 4f417c3f11fb8d6e604f63a58f5a060b5e78405a (having also 6974b535884ab09369101dd451316479679f6012 applied), in rare cases, we have some UC_HOOK_MEMs gone missing. They are neither repetitions nor unaligned. Unfortunately I do not have a simple reproduction scenario. Target is x86.

boborjan2 avatar Apr 02 '25 12:04 boborjan2

Weird, how can I help to reproduce?

wtdcode avatar Apr 02 '25 12:04 wtdcode

Yes, I don't just know, I am trying to narrow it down.

boborjan2 avatar Apr 02 '25 12:04 boborjan2

If possible, do as follows:

Build a debug build with -DUNICORN_LOGGING=on -DCMAKE_BUILD_TYPE=Debug, and run your programs with UNICORN_LOG_LEVEL=7 environment. Note this prints out every translation buffer and can be very very verbose. Redirecting output to a file is highly recommended.

Attach the translation buffers, registers contexts and original instructions around the addresses where hooks are missing. I will try to see which instructions cause this.

wtdcode avatar Apr 02 '25 12:04 wtdcode

Ok, thanks, I will make a trace. What I know so far is that we have ~ 1700 mem read hooks in this regression. After the patch, we get only 246 (the first 246). It seems like the mem hook is completely shut off after a point. We don't receive the hooks from the same ip or target address either (where we had had hits before). The execution flow however is the same, we reach the exit point properly, instruction count is exactly the same in both cases.

boborjan2 avatar Apr 03 '25 10:04 boborjan2

I just notice a better way to debug.

In store_helper and load_helper, if possible could you fprintf(stderr, "%x %x %x\n", addr, val, uc->size_recur_mem) ? I think it is the culprit.

wtdcode avatar Apr 03 '25 10:04 wtdcode

loads.zip

I made a run both before and after the patch. Note: the after-patch is actually the 2.1.3 release.

boborjan2 avatar Apr 03 '25 15:04 boborjan2

Here is what happens:

  1. store_helper() gets called with an unaligned address (size 4) that gets serviced by a recursive call to handle the 4 bytes separately. size_recur_mem is set to 4.
  2. the recursively called store_helper() handles the 1-byte write using notdirty_write()
  3. tb_invalidate_phys_page_fast() is called, then tb_invalidate_phys_page_range__locked()
  4. at the end of tb_invalidate_phys_page_range__locked(), as current_tb_modified is set, cpu_loop_exit_noexc() gets called that exits by a siglongjmp().

As size_recur_mem is set to 4, no UC_HOOK_MEMs are called any more in the session.

  1. A possible solution would be to add size_recur_mem as a parameter to store_helper() and do the recursion fully on stack.

  2. Is this OK that only a halfword (or 1 byte) is written in a 4-byte store? It seems that something is off. From the logs it is obvious that the store is never finished, execution is continued at the next op.

boborjan2 avatar Apr 08 '25 09:04 boborjan2

Hi thanks for this good explanation. Can you try following patch:

diff --git a/qemu/accel/tcg/cpu-exec.c b/qemu/accel/tcg/cpu-exec.c
index 09148f1d..06e0e1fa 100644
--- a/qemu/accel/tcg/cpu-exec.c
+++ b/qemu/accel/tcg/cpu-exec.c
@@ -610,6 +610,7 @@ int cpu_exec(struct uc_struct *uc, CPUState *cpu)
             if (unlikely(cpu->exit_request)) {
                 continue;
             }
+            uc->size_recur_mem = 0;
             cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit);
             /* Try to align the host and virtual clocks
                if the guest is in advance */
diff --git a/qemu/softmmu/cpus.c b/qemu/softmmu/cpus.c
index afb8bbed..22511ac7 100644
--- a/qemu/softmmu/cpus.c
+++ b/qemu/softmmu/cpus.c
@@ -93,7 +93,6 @@ static int tcg_cpu_exec(struct uc_struct *uc)
         //                  (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
         if (cpu_can_run(cpu)) {
             uc->quit_request = false;
-            uc->size_recur_mem = 0;
             r = cpu_exec(uc, cpu);
 
             // quit current TB but continue emulating?

PhilippTakacs avatar Apr 08 '25 09:04 PhilippTakacs

Hi, thanks. I confirm it does the trick.

I gather it's an exception so it's just fine the store is not done (?). Btw how does this align to real-cpu behavior, does a word-sized write end up in a half-word write if there is an exception in the middle?

boborjan2 avatar Apr 08 '25 09:04 boborjan2

Another: shouldn't uc->size_recur_mem reset in the positive branch of sigsetjmp()? Imo it should be reset if we exit the recursion in any case.

boborjan2 avatar Apr 08 '25 09:04 boborjan2

Now that you know how this happen, can you write a minimal reporducer?

Another: shouldn't uc->size_recur_mem reset in the positive branch of sigsetjmp()? Imo it should be reset if we exit the recursion in any case.

I have put it just before executing a translation block, this way it's allways set to 0 when a jump in a tb is happen. IHMO this is the best way preventing bugs like this. I need to read the code a bit more to understand what possible side effects are and what might get wrong with this.

Resetting size_recur_mem only in the positive branch of sigsetjmp() might miss some cases (i.e. when break_translation_loop() is called by a read of the IP register from a mem hook).

PhilippTakacs avatar Apr 08 '25 10:04 PhilippTakacs

A PR is also highly welcome! And thanks @PhilippTakacs for the quick patch.


From: PhilippTakacs @.> Sent: Tuesday, April 8, 2025 6:16:30 PM To: unicorn-engine/unicorn @.> Cc: lazymio @.>; Comment @.> Subject: Re: [unicorn-engine/unicorn] regression: missing UC_HOOK_MEM (Issue #2141)

Now that you know how this happen, can you write a minimal reporducer?

Another: shouldn't uc->size_recur_mem reset in the positive branch of sigsetjmp()? Imo it should be reset if we exit the recursion in any case.

I have put it just before executing a translation block, this way it's allways set to 0 when a jump in a tb is happen. IHMO this is the best way preventing bugs like this. I need to read the code a bit more to understand what possible side effects are and what might get wrong with this.

Resetting size_recur_mem only in the positive branch of sigsetjmp() might miss some cases (i.e. when break_translation_loop() is called by a read of the IP register from a mem hook).

― Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/2141#issuecomment-2785943615, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO2RCULVQ5V64RZN5Z32YOOX5AVCNFSM6AAAAAB2JNGHCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVHE2DGNRRGU. You are receiving this because you commented.Message ID: @.***>

[https://avatars.githubusercontent.com/u/76390863?s=20&v=4]PhilippTakacs left a comment (unicorn-engine/unicorn#2141)https://github.com/unicorn-engine/unicorn/issues/2141#issuecomment-2785943615

Now that you know how this happen, can you write a minimal reporducer?

Another: shouldn't uc->size_recur_mem reset in the positive branch of sigsetjmp()? Imo it should be reset if we exit the recursion in any case.

I have put it just before executing a translation block, this way it's allways set to 0 when a jump in a tb is happen. IHMO this is the best way preventing bugs like this. I need to read the code a bit more to understand what possible side effects are and what might get wrong with this.

Resetting size_recur_mem only in the positive branch of sigsetjmp() might miss some cases (i.e. when break_translation_loop() is called by a read of the IP register from a mem hook).

― Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/2141#issuecomment-2785943615, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO2RCULVQ5V64RZN5Z32YOOX5AVCNFSM6AAAAAB2JNGHCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBVHE2DGNRRGU. You are receiving this because you commented.Message ID: @.***>

wtdcode avatar Apr 08 '25 10:04 wtdcode

My concern is: will execution always touch the reset of size_recur_mem in the while loop in case we exit by sigsetjmp()?

I try to come up with a minimal test case.

boborjan2 avatar Apr 08 '25 10:04 boborjan2

My concern is: will execution always touch the reset of size_recur_mem in the while loop in case we exit by sigsetjmp()?

This is exactly why I don't put it on the exit condition. I want to reset it before starting the emulation, independent why the loop stopped or if it stopped at all. So it's save to assume that on emulation start it's reset. With my last fix I haven't put this deep enough, so now I want to search for other exit conditions and check if they might skipping this reset.

PhilippTakacs avatar Apr 08 '25 10:04 PhilippTakacs

2. Is this OK that only a halfword (or 1 byte) is written in a 4-byte store? It seems that something is off. From the logs it is obvious that the store is never finished, execution is continued at the next op.

I made a test on windows using VirtualProtect to make a page readonly. An unaligned 32-bit access to the end of the r/w area (i.e. last writeable page + 4094) causes access exception and I confirm the value is not written partly.

This means that the loop handling unaligned writes in store_helper() breaks compatibility on x86 (perhaps on other architectures as well).

boborjan2 avatar Apr 10 '25 10:04 boborjan2

I think it is supposed to be atomic? Any context I missed here?

wtdcode avatar Apr 10 '25 10:04 wtdcode

Yes, on real cpu the store is atomic: if it is interrupted by an exception, the store is not done even if unaligned and requires multiple cycles. In the current implementation however, the for cycle in store_helper breaks this by doing the word-sized store by individual byte-sized stores making the first few succeed.

boborjan2 avatar Apr 10 '25 10:04 boborjan2

Yes, on real cpu the store is atomic: if it is interrupted by an exception, the store is not done even if unaligned and requires multiple cycles. In the current implementation however, the for cycle in store_helper breaks this by doing the word-sized store by individual byte-sized stores making the first few succeed.

I still feel it hard for the context. Would greatly appreciate it if you have a reproduction or we don't have enough confidence for a blind fix.

wtdcode avatar Apr 10 '25 12:04 wtdcode

#include <string.h>

/*
    mov eax,0xffe
    mov DWORD PTR [eax],0xdeadbeef
*/
const uint8_t unaligned_store[] = {
    0xB8, 0xFE, 0x0F, 0x00, 0x00, 0xC7, 0x00, 0xEF, 0xBE, 0xAD, 0xDE
};

static void test_i386(const uint8_t *code, unsigned code_size)
{
    uc_engine *uc;
    uc_err err;

    // Initialize emulator in X86-32bit mode
    err = uc_open(UC_ARCH_X86, UC_MODE_32, &uc);
    if (err) {
        printf("Failed on uc_open() with error returned: %u\n", err);
        return;
    }

    uc_mem_map(uc, 0x0000, 4096, UC_PROT_ALL);
    uc_mem_map(uc, 0x1000, 4096, UC_PROT_READ);

    // write machine code to be emulated to memory
    if (uc_mem_write(uc, 0, code, code_size)) {
        printf("Failed to write emulation code to memory, quit!\n");
        return;
    }

    err = uc_emu_start(uc, 0, 0 + code_size - 1, 0, 2);
    if (err) {
        printf("Failed on uc_emu_start() with error returned %u: %s\n", err,
               uc_strerror(err));
    }

    uint32_t stored_value;
    uc_mem_read(uc, 0xffe, &stored_value, 4);

    printf("\n>>> Stored value: %08x.\n", stored_value);

    uc_close(uc);
}

int main(int argc, char **argv, char **envp)
{
    test_i386(unaligned_store, sizeof(unaligned_store));

    return 0;
}

This writes 0xbeef to address 0xffe. Native windows code does not do that. This code also leaves size_recur_mem at 4, but it is not printed out as it is in a private struct.

boborjan2 avatar Apr 10 '25 13:04 boborjan2

Please consider the patch below. To ensure access is atomic, the recursive calls in unaligned case are dry-runned first (where the actual store is not performed). A flag argument is introduced that also handles size_recur_mem in a clean way on stack. Similar solution could be made for load_helper as well if accepted.

diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c
index 16ea80a0..9d69a327 100644
--- a/qemu/accel/tcg/cputlb.c
+++ b/qemu/accel/tcg/cputlb.c
@@ -2105,9 +2105,13 @@ store_memop(void *haddr, uint64_t val, MemOp op)
     }
 }

+enum {
+    HLP_FLG_RECURSIVE_RUN = (1u << 0),
+    HLP_FLG_DRY_RUN = (1u << 1),
+};
 static inline void
-store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
-             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
+store_helper_flg(CPUArchState *env, target_ulong addr, uint64_t val,
+             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op, unsigned flags)
 {
     struct uc_struct *uc = env->uc;
     HOOK_FOREACH_VAR_DECLARE;
@@ -2147,7 +2151,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
     paddr = entry->paddr | (addr & ~TARGET_PAGE_MASK);
     mr = uc->memory_mapping(uc, paddr);

-    if (!uc->size_recur_mem) { // disabling write callback if in recursive call
+    if (!(flags & HLP_FLG_RECURSIVE_RUN)) { // disabling write callback if in recursive call
         // Unicorn: callback on memory write
         HOOK_FOREACH(uc, hook, UC_HOOK_MEM_WRITE) {
             if (hook->to_delete)
@@ -2321,6 +2325,10 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
             notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr, entry);
         }

+        if(flags & HLP_FLG_DRY_RUN) {
+            return;
+        }
+
         haddr = (void *)((uintptr_t)addr + entry->addend);

         /*
@@ -2345,7 +2353,7 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
         CPUTLBEntry *entry2;
         target_ulong page2, tlb_addr2;
         size_t size2;
-        int old_size;
+        //int old_size;

     do_unaligned_access:
         /*
@@ -2388,8 +2396,8 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
          * This loop must go in the forward direction to avoid issues
          * with self-modifying code in Windows 64-bit.
          */
-        old_size = uc->size_recur_mem;
-        uc->size_recur_mem = size;
+
+        /* ensure access is atomic, first dry run  */
         for (i = 0; i < size; ++i) {
             uint8_t val8;
             if (memop_big_endian(op)) {
@@ -2399,16 +2407,41 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
                 /* Little-endian extract.  */
                 val8 = val >> (i * 8);
             }
-            helper_ret_stb_mmu(env, addr + i, val8, oi, retaddr);
+            store_helper_flg(env, addr + i, val8, oi, retaddr, MO_UB, HLP_FLG_RECURSIVE_RUN|HLP_FLG_DRY_RUN);
+            /* if access is blocked we either exit here or via siglongjmp */
+            if(uc->cpu->exit_request) {
+                return;
+            }
+        }
+        /* dry run is successful, do the real store */
+        for (i = 0; i < size; ++i) {
+            uint8_t val8;
+            if (memop_big_endian(op)) {
+                /* Big-endian extract.  */
+                val8 = val >> (((size - 1) * 8) - (i * 8));
+            } else {
+                /* Little-endian extract.  */
+                val8 = val >> (i * 8);
+            }
+            store_helper_flg(env, addr + i, val8, oi, retaddr, MO_UB, HLP_FLG_RECURSIVE_RUN);
         }
-        uc->size_recur_mem = old_size;
         return;
     }

+    if(flags & HLP_FLG_DRY_RUN) {
+        return;
+    }
     haddr = (void *)((uintptr_t)addr + entry->addend);
     store_memop(haddr, val, op);
 }

+static inline void
+store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
+             TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
+{
+    store_helper_flg(env, addr, val, oi, retaddr, op, 0);
+}
+
 void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
                         TCGMemOpIdx oi, uintptr_t retaddr)
 {

boborjan2 avatar May 06 '25 08:05 boborjan2