retrowin32 icon indicating copy to clipboard operation
retrowin32 copied to clipboard

GDB stub and async shim rework

Open encounter opened this issue 1 year ago • 11 comments

This implements a working GDB stub for x86-emu and x86-unicorn. One larger change is the new state machine in cli/main.rs. This code is now shared between x86-emu and x86-unicorn: it allows stopping, resuming and single-stepping the machine emulation, handling breakpoints and machine errors in a unified way.

Another larger change is reworking the async shim handling. There's no longer any backend-specific code in builtin.rs. (Take a look at the simplified implementation!) x86-unicorn was updated to use the x86-emu approach for future handling (using eip=MAGIC_ADDR to poll futures instead of executing CPU), though I'd like to consider ideas to unify this code between the two as well, if possible.

Other changes:

  • x86-unicorn: Unmapped first 4k of memory to catch zero page accesses
  • Removed --trace-blocks because I didn't feel like reimplementing it (I could, though, if desired)
  • Removed x86-emu snapshot feature. I broke it while refactoring things to use Pin<>, and decided to rip it out and revisit later if it's still a desired feature. One big issue is that it's incompatible with any running futures.

TODO:

  • [x] Fix x86-64 build
  • [x] Fix web build
  • [ ] Multi-threading support
  • [x] Restore --trace-points

encounter avatar Jul 31 '24 00:07 encounter

Thinking about this change, I would like to merge it in pieces to better understand the different parts. Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?

evmar avatar Aug 02 '24 15:08 evmar

Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?

Go for it! I'll rebase accordingly once I get time to work on it again.

encounter avatar Aug 02 '24 21:08 encounter

112953833a9dd8e187d6a4690ba57b6e5be09889 removes all the snapshotting, including the signal handler and web UI for it

evmar avatar Aug 04 '24 17:08 evmar

I copied your change to add Handler::Async, and one thing I notice about it is we end up with two Boxed futures per async call. The first basically holds the decoded stack args, and then the second calls the first and updates state based on its result. The previous code had only one boxed future because it bundled those two things together. I'm not sure there's a good way to resolve this.

https://github.com/evmar/retrowin32/pull/31 -- my attempt

evmar avatar Aug 04 '24 22:08 evmar

If handle_shim_call was always async, one Boxed future could be avoided. It’s only used to conditionally return a Future. Maybe it would work to make that function return Option<impl Future> instead?

edit: I realized that wouldn’t allow us to store it in a Vec, unless we converted it to a custom Future implementation to make it a concrete type.

encounter avatar Aug 05 '24 01:08 encounter

Hm yeah, I had a similar thought. Unfortunately it's put in Vec stored per-cpu, and the cpu layer doesn't know about shims, hrmm.

evmar avatar Aug 05 '24 05:08 evmar

In 5c92c7d93f1320b08405fff7a92adbd94f440b1b I was able to simplify the async shim handling. It pushes the responsibility for storing and polling the future up into the top-level event loop (cli or web, web updates in 68221580f023031128aac76a5ef8ced784374cf9). machine.call_shim returns the BoxFuture<u32> from the shim directly, instead of wrapping it in another future for updating the registers. The event loop stores this future, polls it, and calls finish_shim_call when complete, which will then run the machine-specific register updates.

x86-emu and x86-unicorn only have to return StopReason::Blocked when their EIP=MAGIC_ADDR, which tells the outer event loop to perform future polling. This simplifies their implementations quite a bit.

encounter avatar Aug 07 '24 02:08 encounter

I haven't had a chance to look at this yet, but I wanted to note my pending builtins-as-dlls work lets us eliminate some of the post-shim code: https://github.com/evmar/retrowin32/pull/30/commits/60b256e4e50404f8e7be9f5fcd09420c437cdef2

basically the new flow is that the original binary does some call [SomeFn] which shows up at

SomeFn:
   syscall  ; causes Rust impl to be invoked
   ret 12   ; the number here is stack_consumed

The only thing the shim implementation code needs to manage is taking the return value of the Rust fn and putting it into eax.

I haven't quite figured it out yet but I am pretty sure that async fns will benefit similarly.

Now that I've typed this out, maybe the eax handling means this doesn't win anything, hrm.

evmar avatar Aug 07 '24 06:08 evmar

New idea: make the futures Vec a Vec of Future<Option<u32>>, where a present value means "put this in eax when done". Coupled with my other change that removes the other code that needs to run after an async block I think that is enough?

evmar avatar Aug 07 '24 17:08 evmar

Sounds good to me. I think that's similar to the solution I cooked up in the above commit:

struct AsyncShimCall {
  shim: &'static win32::shims::Shim,
  future: BoxFuture<u32>,
}
let mut shim_calls = Vec::<AsyncShimCall>::new();

// ...

match stop_reason {
  win32::StopReason::Blocked => {
    // Poll the last future.
    let shim_call = shim_calls.last_mut().unwrap();
    match shim_call.future.as_mut().poll(&mut ctx) {
      Poll::Ready(ret) => {
        target.machine.finish_shim_call(shim_call.shim, ret);

Except now finish_shim_call is only responsible for setting eax, instead of doing the eip and stack manipulation as well.

encounter avatar Aug 07 '24 17:08 encounter

I am so so sorry this has taken me so long! I have merged pieces of it and I am working on the main debugger part, but it also managed to collide horribly with this dll change I've also been working on for a long time so it's been kind a three way train wreck between your change, my change, and also my personal life stuff.

evmar avatar Sep 12 '24 17:09 evmar