Patcherex2
Patcherex2 copied to clipboard
Proposal: Copy and Micropatch based target
My coworker, Phil Zucker, came up with a new clever method of creating micropatches using an ordinary C compiler. The strategy is called "copy and micropatch" and operates similarly to copy and patch JITs. The strategy is based around abuse of the calling convention to force values into certain registers.
The easiest way to illustrate the concept is with an example (taken from Phil's blog):
#include <stdint.h>
uint64_t CALLBACK(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9);
uint64_t PATCHCODE(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9){
// Some random patch code here
if(rcx >= r8){
rdi = rsi * rdx;
}
// End patchcode
return CALLBACK(rdi, rsi, rdx, rcx, r8, r9);
}
The calling convention for this snippet ensures that the PATCHCODE
receives certain registers as inputs, and the CALLBACK
at the end ensures that the variables are placed into the correct registers once the function terminates.
The code is passed through an ordinary C compiler, and the body of PATCHCODE
is extracted and inserted somewhere where there is space. This process requires tail-call optimization turned on, which turns the call to CALLBACK
into a jump. Through the use of a linker script we could set the CALLBACK symbol to be placed at the detour return point.
With the __attribute__((preserve_none))
tag built into the latest version of Clang, we can get control over many registers (at least on x64). Note that the preserve_none
is brand new, I don't think it has landed into any release versions of Clang yet. As an alternative to preserve_none
, we could add shims to push/pop registers to ensure the data gets to the right place.
For more info, see Phil's blog here: https://www.philipzucker.com/permutation_compile/
I'm willing to put the time into developing this target for integration into patcherex2. Is there anything that we need to know before forking and getting started? Using the version of Clang with support for preserve_none
would be highly desirable.
See also:
- preserve_none for ARM64: https://github.com/llvm/llvm-project/issues/87423
- https://discourse.llvm.org/t/rfc-exposing-ghccc-calling-convention-as-preserve-none-to-clang/74233
- https://github.com/llvm/llvm-project/pull/76868
@calebh Thanks for bringing this idea up! This does indeed look like a very clever way to do instruction-level patching using C code.
I agree using preserve_none
is way better / cleaner than adding code to push/pop registers. My main concern is that clang-19 hasn't been officially released yet and preserve_none
is currently only supported for x64. So I think it would be great to add clang-19 as another compiler component and keep the current clang compiler component unchanged.
I think a good way to implement this is to add an optional argument language
to InsertInstructionPatch
, as it will still ultimately behave like an instruction-level patch. Here's a rough idea of what the usage might look like:
p = Patcherex("some_binary", target_opts={"compiler": "clang19"}) # clang 19 component to be implemented
c_code = """
if(rcx >= r8){
rdi = rsi * rdx;
}
"""
p.patches.append(InsertInstructionPatch(0xdeadbeef, c_code, language="C"))
Let me know if you have a better idea on how to integrate it into patcherex2 :)
An initial implementation for x64 optionally using preserve_none is now working in our fork. See the example here: https://github.com/draperlaboratory/Patcherex2/blob/main/examples/insert_instruction_patch_c/patch.py
Here is the general strategy that I have implemented:
For the most part the logic is the same as an assembly InsertInstructionPatch, except that we compile C instead of assembly. The inserted code consists of the compiled C code concatenated with the moved instructions. This is followed by a jump back to just after the insertion point. The CALLBACK
function called by the C code simply jumps the program 1 instruction ahead to the moved instructions (which means it is essentially a nop.) The location of the extern CALLBACK
itself is defined using a symbol passed to the linker script.
There are changes in a few different places:
- In archinfo, the
Amd64Info
class now has calling convention and subregister information. - In the
InsertInstructionPatch
class, the apply method has been split into_apply_asm
and_apply_c
. The_apply_c
function builds the C code required to compile the micropatch, then passes the code string top.utils.insert_trampoline_code
.insert_trampoline_code
has been modified to additionally accept a C string as theinstrs
argument.
The user can also use subregisters by passing them as appropriate to c_in_regs
and c_out_regs
. For example, the following is okay:
from patcherex2 import *
p = Patcherex("add", target_opts={"compiler": "clang19"})
c_str = """
edi += edi;
edi += 5;
"""
p.patches.append(InsertInstructionPatch(0x114d, c_str, language="C", c_in_regs=["edi"], c_out_regs=["edi"]))
p.apply_patches()
p.binfmt_tool.save_binary()
However you cannot use both rdi
and edi
at the same time.
What remains to be done:
- Add support for floating point registers. On x64 these are registers
xmm0
toxmm7
- Add support for more architectures by specifying their calling conventions. In particular aarch64 will needed to be updated when
preserve_none
lands in clang19. Note that copy and micropatch works fine without preserve_none, you just get a lot fewer registers under your control.
Primary files changed:
- https://github.com/draperlaboratory/Patcherex2/blob/main/src/patcherex2/patches/instruction_patches.py#L177
- https://github.com/draperlaboratory/Patcherex2/blob/main/src/patcherex2/components/utils/utils.py#L16
- https://github.com/draperlaboratory/Patcherex2/blob/main/src/patcherex2/components/archinfo/amd64.py#L48
Here is the current generated C for the example program (the user never sees this):
#include <stdint.h>
extern void __attribute__((preserve_none)) _CALLBACK(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9, uint64_t r11, uint64_t r12, uint64_t r13, uint64_t r14, uint64_t r15, uint64_t rax);
#define return return _CALLBACK(rdi, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy, _dummy)
void __attribute__((preserve_none)) _MICROPATCH(uint64_t rdi, uint64_t rsi, uint64_t rdx, uint64_t rcx, uint64_t r8, uint64_t r9, uint64_t r11, uint64_t r12, uint64_t r13, uint64_t r14, uint64_t r15, uint64_t rax) {
uint64_t _dummy;
rdi += rdi;
rdi += 5;
return;
}
#undef return
@calebh Thanks for the great work on this so far! The overall code looks pretty good to me. A few thoughts:
- It would be great if the generated C code can be shown in the log (maybe at the
DEBUG
level) to make it easier to debug and understand what's being generated. - I'd personally prefer the
get_cc
andget_subregisters
functions to be in the archinfo component instead of targets, as they provide architecture specific information rather than target specific information.
Let me know if there's anything you need from me to help wrap this up. Once you feel it's ready, please go ahead and open a PR against the main branch. I'll do a thorough code review and testing pass, and then we can get it merged.
Thanks again for driving this forward!
The fork is currently in good shape, nearly ready to merge back into the main repo. Most of the remaining tasks revolve around different architectures. What's left to do:
- Change the
preserve_none
register list for x64 once this LLVM pull request lands: https://github.com/llvm/llvm-project/pull/88333 - Add support for Aarch64 (ARM64) once
preserve_none
for that platform lands. I have not seen any pull requests for this architecture, so the timeline for adding this seems unclear. - Test other architectures. The method is still somewhat useful for architectures where
preserve_none
is not supported. However in general you will have less registers under your control.
I do not currently have access to any systems that are not x64. Do you have any system for testing non x64 architectures?
@calebh Thank you for your efforts to make this happen! I don't currently have access to any non-x64 systems, for now all the non-x64 archs are being tested with QEMU. For example tests/test_aarch64.py#L298-L303. .github/actions/install-patcherex2/action.yml#L14-L20 lists the dependencies required for QEMU tests.
Implemented in #31