REPENTOGON icon indicating copy to clipboard operation
REPENTOGON copied to clipboard

[ZHL] Detours breaks functions that contain jumps to their beginning

Open Sylmir opened this issue 2 years ago • 13 comments

If a to-be detoured function contains jumps back to instructions whose bytes are contained within the first five bytes of the function, Detour does not fix the jumps within the rest of the function, which leads to invalid instruction exceptions.

Sylmir avatar Sep 21 '23 18:09 Sylmir

I fixed this a long time ago in our variant in FTL Hyperspace, it's roughly around here https://github.com/FTL-Hyperspace/FTL-Hyperspace/blob/d8012662b982725cbfc5d913636c1657a57eaced/detours.h#L582

NOTE: Our code is not licensed the same as mologie detours for these additions, the additions fall under Hyperspace's license... I'm still talking with the team but technically I think I made all the additions to the detours.h file (maybe, will need to double check our git blame)... I'd be fine under MIT/GPLv2/GPLv3 although would always love credit somewhere even if not required by those other licenses.

Basically when we run into relative jumps we move them & recompute their offsets. There's a couple cases we can't handle (didn't bother to because never ran into them in FTL) (LOOP/JCXZ/JECXZ/JRCXZ and all 16 bit relative jumps) but all your normal relative Jumps that are 8-bit jumps should work.

It also breaks reverting (because we didn't need reverting in FTL), if that matters for Repentogon more could be done to fix reverting, but I didn't bother to store the original bytes somewhere else for reverting.

Nasa62 avatar Dec 03 '24 21:12 Nasa62

Thanks, that will prove invaluable !

I git blamed hyperspace's detours.h and saw this commit https://github.com/FTL-Hyperspace/FTL-Hyperspace/commit/d941776fb06fffc2ac4f3437d2a0c5bb2110e71d where you added support for this scenario, it only touches detours.h, which seems reasonable.

I see, in the commit message, that you ran into an issue I foresaw before, and the reason I did not update detours to simply recompute the jump destination : jumps that loop back to that destination. Now, walking the function until we reach ret or int3, or simply having a signature for the "end-of-function" address, and checking all jumps in there is not too difficult, but there is yet another issue : jump tables. Basically, what if we reach something like jmp [eax + offset] ? In this context we cannot relocate the jump statically, we need to wait until we know whether the jump table lands in the overriden destination or not.

One solution I see to this problem would be a PLT / GOT-like mechanism: rewrite the call to the jump table to land into a routine that will analyze the content of the jump table and determine if a relocation is needed or not. This is not perfect, because the jump table may be reused in other functions (compilers are crazy) so we cannot outright change the content of the table itself, we would need to relocate on the fly. I feel this is the kind of scenario where we would need to place many annotations on the function declaration in the ZHL file in order to generate code that can resolve the issue (are there jump tables in the function, where are they, are these tables used elsewhere, etc.)

About crediting : considering you showed us the file, I think it is only natural for us to credit the FTL Hyperspace team for giving us the inspiration on how to tackle this issue :)

Sylmir avatar Dec 03 '24 22:12 Sylmir

Have you an actual sample of code that reaches a jump table? Because I deal with relative based jumps in 64-bit code just fine, but not jumping into the PLT/GOT... as long as we don't get allocated more than 4GB away we're good :face_exhaling:

I considered using intra-function gap space as a place to shove this, but realistically it was never needed.

Jumps looping back we never actually ran into in practice, but yeah you'd need to know the end of function reliably to parse the whole function and we found it wasn't necessary.

In fact, very often it seems ZHL's walk-to-RET is actually bad for performance and pointless anyways in our use cases. (we never use the one signature match function declaration shortcut that it supports because it doesn't play nicely with anything except MSVC 32-bit)

Nasa62 avatar Dec 03 '24 23:12 Nasa62

EDIT: Jump tables shouldn't jump into mid-function should they? We're always inserting a jump away (detour) at the start of a function, so something reaching us via PLT/GOT should end up at the start of the function address anyways.

Nasa62 avatar Dec 03 '24 23:12 Nasa62

Hi, I think you've accidentally stumbled onto the wrong version of ZHL (we're an offshoot based on an older version, designed with hooking The Binding of Isaac: Repentance in mind).

namishere avatar Dec 03 '24 23:12 namishere

@namishere I'm a dev of FTL Hyperspace which also uses ZHL and I have done separate tweaks to it. I'm discussing changes I've made to ours that may benefit development of your version of it too.

Nasa62 avatar Dec 03 '24 23:12 Nasa62

Alright, I've notified our ZHL mastermind @Sylmir; he'll be best suited to answer and take feedback. Thank you!

namishere avatar Dec 03 '24 23:12 namishere

Alright, I've notified our ZHL mastermind @Sylmir; he'll be best suited to answer and take feedback. Thank you!

Yeah he's already higher up in this comment chain

Nasa62 avatar Dec 03 '24 23:12 Nasa62

One thing I can immediately address; we are forced to build around MSVC x32 to match what the game is compiled with

namishere avatar Dec 03 '24 23:12 namishere

Yes I know, and ZHL is originally designed for that. Our use case of it doesn't use MSVC so we had extended it to other uses.

Nasa62 avatar Dec 04 '24 00:12 Nasa62

In order of your messages:

  • When I mentionned the GOT / PLT, I didn't mean using the GOT / PLT to resolve offsets. I meant using a similar mechanism. Recall that in PLT/GOT based executables, entries in the PLT are first initialized to jump into the dynamic linker, in order to resolve the runtime address of the symbol they refer to. Once this symbol is resolved, the entry in the PLT is updated to directly jump into the symbol. My suggestion here was to use a similar mechanism to resolve runtime computed relative jumps, have them first jump into a resolution routine on our end (by patching the original jump), and this resolution routine would either update the modified jump (if it can be safely done), or perform an on-the-fly dynamic relocation if the offset may be shared elsewhere (for instance if a jump table is used in multiple places -> updating an offset in it may break an unrelated function).
  • Regarding ZHL's walk-to-ret: I never realized ZHL did that, and I don't really see the purpose. It speeds up scanning when the order of functions cannot change in the executable, making the whole thing very fragile when the executable is updated (as has been the case with BoI Repentance two weeks ago). The way it is performed is also very fragile for two reasons:
    1. There are other ways to exit a function, for instance the leave instruction that basically does the usual pop ebp; ret; in a single instruction rather than two.
    2. Some functions do not have a ret statement, for instance functions that shared a block of code at their end. In this case, the compiler can be free to compile the shared block of code once and add a tail absolute jump to this shared block at the end of both functions. In fact, this is what the Nicalis (the Isaac devs) compiler (MSVC) did on version 1.7.9b, causing Ghidra to lose its marbles while decompiling some functions. The walk-to-ret would actually cause ZHL to miss some functions in that scenario, because it may walk over other functions when updating its pointer to the last address.
  • Regarding jump tables:

An example of a jump table : image

When I started working on Repentogon I was nearing the end of my PhD on compilation and parallel programming, so I had the opportunity to talk at length with compilation experts about what compilers can do when it comes to generating ASM (and now two of my colleagues are contributors to LLVM and another is a contributor to gcc). While backwards jumps are rare, they are not impossible, and as long as the function is properly aligned (possibly by using nops), it is not forbidden to generate a function whose prolog starts outside a clearly delimited pair of int3 sections. It is not forbidden either to interleave functions and have an absolute jump between functions (a thing we also saw in Isaac at least once, and it ALSO caused Ghidra to lose its marbles again). Also some functions may not have prologs at all, so a jump to their beginning is not unheard of either.

This is of course very theoretical, but we've had so many surprises with ZHL sometimes doing unexpected stuff I've become very cautious when it comes to changing anything and making sure all corner cases are properly handled (at the very least ZHL should warn about encountering a difficult scenario)

Sylmir avatar Dec 04 '24 08:12 Sylmir

tl;dr after rewriting the other parts over and over

Okay, right now you completely break functions with jumps in the beginning https://github.com/TeamREPENTOGON/REPENTOGON/issues/121 this can be solved with what I did in Hyperspace but maybe extended to handle rel16 jumps (realistically a rel32 jump in the first 5 bytes is already 5 bytes)

  • this wouldn't break any existing hooks because it doesn't change how the jumping works in those cases

Likewise this current issue #122 where jumps come back to the first 5 bytes of the function, as long as it's not the 4 bytes after the first it's already fine

  • We need to ignore very very very rare edge cases like the absolute indirect JMP to the 4 bytes after the first. (but the point is, these would currently cause a crash anyways regardless of everything else below or of #121), we could try to log these but since they can legally be anywhere in the file it would basically mean implementing what Ghidra does and like you mentioned even Ghidra cries at these crazy indirect ones.
    • Sure issac has absolute indirect jumps like you show in that picture, but do any of them hit the tiny 4 byte window? It's unlikely (and you can't hook them today anyways), it's likely more relative jumps hit that and we can deal with relative jumps.
  • A second pass scanner could be run only in development mode and be aware of all already found signatures
  • We then can look at the entire binary in memory searching for any rel8/rel16/rel32 jumps (and their jcc relative counterparts) and if they target the 4 bytes after the first byte of any hooked signature, throw an error for development.
  • If we actually find any use cases of relative jumps targeting those 4 bytes after the first, then let's talk further about tweaking ZHL to have an end-of-function signature for those specific functions that trigger the second-pass scanner error.
PLT/GOT

Yeah I don't think we need that approach at all to making a lookup/jump table just to rewrite the middle of the function, sure we can, but now we're getting really complex like the microsoft detours library (which if I recall from my research did this... or at least one of the detours libraries out there does attempt to rewrite middle of functions in order to completely move the entire original function to a different memory space... but I don't think we really need that for ZHL's needs) Although on that note, given that Issac is only windows and only MSVC, that detours package could actually probably be used instead of MologieDetours. (I need to look it up again, might have been a different detour library but I know I saw one at one point that did move whole functions in memory)

~~PLT/GOT I don't think is an issue, unless they actually can jump mid-function? But I don't think they can. Sure local jumps can, but PLT/GOT? Those should be to the start of functions (even though yeah legally maybe there's no strict rule that says a compiler can't do that, does that make sense for an external library PLT/GOT reference?)~~

one thing we could do at least maybe to help catch it in development is throw CC (INT3) at the end of the detour (extending it to 6 bytes) just to try to catch if someone ends up pointing memory into the detour? Could go 7 bytes and thus call our own interrupt to have a crash handler during non-debugging also.

Original brainstorming

Definitely off topic

  • Regarding ZHL's walk-to-ret: I never realized ZHL did that, and I don't really see the purpose. It speeds up scanning when the order of functions cannot change in the executable, making the whole thing very fragile when the executable is updated (as has been the case with BoI Repentance two weeks ago). The way it is performed is also very fragile for two reasons:

Yeah, it's supposed to be so you can like only have one signature and match several functions in a row; yeah it's extremely fragile and it also is only really practical for x86 functions (and likely with low optimization) when I looked at that in 64-bit well there's no consistency due to the AMD64 red zone and you frequently not needing to push to preserve the stack space...

But on that note, it can be helpful for more optimization to have functions in relative order using . signature prefix and also to use ! to just stop the RET search anyways if it's irrelevant... we didn't turn it off globally but, it could be if you don't need it for the shortcut signature. (But that's a separate topic, and you get way more signature scan speed boost out of . and ordering the functions anyways by not constantly resetting to the top of the binary)

slightly more on topic

  • Regarding jump tables:

An example of a jump table : image

Okay then we're jumping near to some switch's case if that value was 0x3bd sure, probably the last/highest value of an enum or something and that's a default error case or something.

Then we're doing pointer math (ah yes this makes sense for the switch) the case we'll jump to is at 0x5aabc0 + EAX which is something below 0x3bd, so we're jumping between 0x5aabc0 and 0x5aaf7d... of course that's flipping obnoxious because automating that check is a nightmare to process the CMP + MOVZX math together... and determining if it jumps up is also a nightmare because we can't just ensure that register is not negative...

I think the way to solve it 100% is to then patch that again/rewrite it to include a runtime check on the register safe range but that's far more complicated... I don't think we need to 100% get this to prevent it but really we're just getting completely impractical now for something super rare that we could just handle manually in those few rare cases that it manages to target the 4 critical bytes of the jump.

  1. There are other ways to exit a function, for instance the leave instruction that basically does the usual pop ebp; ret; in a single instruction rather than two.
  1. Some functions do not have a ret statement, for instance functions that shared a block of code at their end. In this case, the compiler can be free to compile the shared block of code once and add a tail absolute jump to this shared block at the end of both functions. In fact, this is what the Nicalis (the Isaac devs) compiler (MSVC) did on version 1.7.9b, causing Ghidra to lose its marbles while decompiling some functions. The walk-to-ret would actually cause ZHL to miss some functions in that scenario, because it may walk over other functions when updating its pointer to the last address.

Okay so for both of these, rather than trying to detect the function end what if we just build a second-pass dev scanner that detects the issue not the proper function end, see below.

Also slightly tangent, I think it might be good to eventually flip the ZHL RET scanning logic, either removing it if there's no instances of the shortcut double function matching in a row thing, or flipping it so only those (which should be easy to detect with a grep/awk of the ZHL signature files) have ! or some other character to explicitly turn it on for that one match or something. I still think removing it entirely is better, it's just an annoying complication (ours doesn't remove it but now that i've had this conversation I'm considering just ripping it out in ours anyways because I can't justify why we in Hyperspace would want it anymore)

Second pass scanning

To avoid major ZHL changes we could just check these in a second pass.

EA & FF jumps are basically impossible for us to follow through every edge case without basically building Ghidra, and then some... if we just drop the backwards jumps edge case and the jumping from another function to a shared middle of a different function then we can do it in the second pass scanner with some sanity. EDIT: I think Microsoft's approach might be to jump to a nearby address (gap between functions) and add a patch that does a relative jump from there with the same register or something... ugh I guess that's possible. As long as you have a gap within jmp short distance (rel8) then you could replace any 4-5+ byte indirect jump with a trampoline that pushes eip (or whatever register is used for the indirect jump) and at a minimum a short jump to a gap within rel8 distance that you can then shove a full 5-byte jmp into.

We do the following logic as a second-pass dev-build-only scanner after ZHL's scanner:

  1. We have all the signature addresses from ZHL
  2. Scan entire binary for jumps:
    • EB/E9 jumps (rel8/rel16/rel32) can be computed to see if they're an issue
      • jumpTarget = jumpPtr (current pointer) + jump offset
      • if(jumpTarget > signature_start && jumpTarget <= signature_start + 5) (we don't care if it's pointed directly at start because that'll still hit our jmp)
      • If true, throw error into log to warn developer of a visible hook issue.
    • EA & FF jumps are annoyingly tricky and need manual review to be 100% certain but assuming forward jumps
      • If we make an assumption that they're not insane backwards jumps beyond the scope of the function, we can read the address portion of the indirect jump, and check "is that address nearby a hook'ed signature address" and maybe log it.

We might maybe also include a 3rd pass (or optimize it into the second pass or something, just trying to keep simple at first) to scan for any full rel32 JMPs that point to those signature locations, so basically we've got one pass that looks at the whole binary, and one that pays attention to function sections.

When I started working on Repentogon I was nearing the end of my PhD on compilation and parallel programming, so I had the opportunity to talk at length with compilation experts about what compilers can do when it comes to generating ASM (and now two of my colleagues are contributors to LLVM and another is a contributor to gcc).

Fun, we found a bug in LLVM's generation due to our tweaks to ZHL :) (still hasn't been fixed)

Nasa62 avatar Dec 05 '24 00:12 Nasa62

Okay thinking about the FF/EA absolute indirect jumps.

They're either FF/4 (5 byte) or FF/5 (6 byte) or they're the EA jumps that can be 5 byte in 32-bit mode or 4 byte in 32-bit mode with 16 bit address + the override modified e.g. 66 ea beef (I think?) or ea deadbeef.

So these annoying jumps can be between 4 and 6 bytes in length.

This means we always have enough for a push of a register and at least a jmp/rel8 (3 bytes total) We don't always have enough room for a push + jmp/rel32 or jmp/rel16, 16 is 4 bytes + push so 5 bytes, rel32 is 5 bytes + 1 for push so 6 bytes.

So if it's the worst case scenario "EA cd JMP ptr16:16 Jump far, absolute, address given in operand." then we need to find a gap of at least 5 empty bytes (and there is a pattern for detecting this for what MSVC generates I think in the microsoft detours library here) that is within 8 bits of address space of the detour where we can shove a 5 byte jump.

If it's the 5 byte scenario we can shove a 16 bit jump in with the register push so we can jump within 64k address space around to find a gap, much easier.

Finally the best case 6 byte ones we just straight up rewrite to a full 32 bit jump plus the push and go somewhere in allocated memory not nearby to finish off the trampoline/rewrite...

Okay yeah... I see this as possible... and error detection for if we can't find a gap to rewrite is possible...

Now knowing if we really need to rewrite it is another story (that goes down the static analysis/ghidra route of this is wayyyy too complicated), but if we know the start & end of a function we could just blindly rewrite all the absolute indirect jumps within that function. And if we're okay saying that the next signature should be considered the end of the function, we could just dynamically rewrite a lot of stuff (and this is where I think your example of the GOT/PLT is what you were saying that it's similar)...

Although do we really actually need to bother with that? Again I think it's such a minor edge case, but yeah I see a path forward (although partially re-inventing the wheel here if Microsoft Detours does that instead already, not sure if it handles every case but looking around I see something in it for FF jumps)

Nasa62 avatar Dec 05 '24 08:12 Nasa62