pyvex icon indicating copy to clipboard operation
pyvex copied to clipboard

Cannot partially lift basic blocks containing invalid instructions

Open rickyz opened this issue 1 year ago • 7 comments
trafficstars

Description

VEX panics on certain invalid instructions, which results in lift() outputting nothing for the entire basic block (not even IR for instructions prior to the invalid one). Ideally, it would return a partial lifted block instead. This breaks angr's handling of self-modifying code containing these instructions (which can be worked around by single-stepping, but that comes with a performance penalty).

Steps to reproduce the bug

Example:

import pyvex, archinfo

# mov word ptr [rip], 0x9090
>>> inst0 = b'\x66\xc7\x05\x00\x00\x00\x00\x90\x90'

# invalid instruction referencing a nonexistent segment register
>>> inst1 = b'\x8c\xf0'

# Lifting the first instruction works
>>> pyvex.lift(inst0, 0, archinfo.ArchAMD64()).pp()
IRSB {
   t0:Ity_I64 t1:Ity_I64

   00 | ------ IMark(0x0, 9, 0) ------
   01 | STle(0x0000000000000009) = 0x9090
   NEXT: PUT(rip) = 0x0000000000000009; Ijk_Boring
}

# Appending the invalid instruction results in no output
>>> pyvex.lift(inst0 + inst1, 0, archinfo.ArchAMD64()).pp()
IRSB {


   NEXT: PUT(rip) = 0x0000000000000000; Ijk_NoDecode
}

# The failure is due to the panic at https://github.com/angr/vex/blob/c0bace1c7e86a12e4d845139fc246334e7bc402f/priv/guest_x86_toIR.c#L519
>>> print(pyvex.lifting.libvex.LibVEXLifter.get_vex_log())
vex: the `impossible' happened:
   segmentGuestRegOffset(amd64)

Environment

No response

Additional context

While VEX's disassembler could be fixed to handle this specific failure gracefully, I worry (though I have not verified) that there may be enough other reachable panics strewn around that it would take a fair amount of effort to fix up all of the error handling. Perhaps an easier, if somewhat hacky solution would be to have bb_to_IR update a lifted_upto variable in VexTranslateResult after every successfully lifted instruction so that pyvex can retry lifting with max_bytes to obtain a partial lift.

rickyz avatar Oct 06 '24 20:10 rickyz

It seems to me that we can gracefully handle cases where vpanic is called in VEX. We get to register a callback function that vpanic will invoke, and the current callback function we register in pyvex will result in vex_lift returning NULL instead of preserving already lifted instructions. I think we can fix this problem by returning _lift_r instead of returning NULL (plus whatever necessary fixups required, see here).

A stopgap solution that you may use for now is limiting the max number of instructions that you lift each time - try limiting it to 1 to start with.

ltfish avatar Oct 06 '24 23:10 ltfish

I'm transferring this issue to angr/pyvex since we should be able to fix it without touching VEX itself.

ltfish avatar Oct 06 '24 23:10 ltfish

Thanks for the super fast response! fwiw, I worry that vpanicing from an arbitrary location could leave the results in a not-fully-valid state that wouldn't be safe to return - it looks like the current implementation of LibVEX_Lift() and bb_to_IR() do things like IR optimization/cleanup and populating certain out params only after successfully completing the disassembly loop. That would be skipped if we longjmp out early.

Since the IRSB is a return value of LibVEX_Lift() rather than an out param, it's also tricky to make it return partial results on a vpanic.

rickyz avatar Oct 07 '24 01:10 rickyz

fwiw, I worry that vpanicing from an arbitrary location could leave the results in a not-fully-valid state that wouldn't be safe to return

It's entirely possible! That's what I referred to as "plus whatever necessary fixups required" in my original reply. Upon a closer look, I think a better solution is doing an automatic re-lifting in PyVEX with an instruction limit when the first lifting fails after decoding N instructions. We can get N from _lift_r. This way we do not need to intensively patch VEX code. Do you have other suggestions?

ltfish avatar Oct 07 '24 05:10 ltfish

Yeah, that's what I was thinking as well. It looks like VexTranslateResult::n_guest_instrs can be used, but bb_to_IR() would need to be changed to increment it after every instruction at https://github.com/angr/vex/blob/c0bace1c7e86a12e4d845139fc246334e7bc402f/priv/guest_generic_bb_to_IR.c#L456 instead of incrementing a variable that is only written back to VexTranslateResult at the end of the function.

fwiw, here is the quick hack I did in angr to work around this - the first part of the patch wouldn't be necessary with the fix that we discussed, but the second part fixes a related SMC bug where instructions that could not be disassembled/lifted would not be correctly checked against the dirty address set:

diff --git a/angr/engines/vex/heavy/heavy.py b/angr/engines/vex/heavy/heavy.py
index a2e0fff2d..79a7b63b6 100644
--- a/angr/engines/vex/heavy/heavy.py
+++ b/angr/engines/vex/heavy/heavy.py
@@ -152,6 +152,10 @@ class HeavyVEXMixin(SuccessorsMixin, ClaripyDataMixin, SimStateStorageMixin, VEX
                 and irsb.next.con.value == irsb.addr
                 and not self.state.project.is_hooked(irsb.addr)
             ):
+                if num_inst != 1:
+                    num_inst = 1
+                    irsb = None
+                    continue
                 raise errors.SimIRSBNoDecodeError(
                     f"IR decoding error at 0x{addr:02x}. You can hook this "
                     "instruction with a python replacement using project.hook"
@@ -251,6 +255,11 @@ class HeavyVEXMixin(SuccessorsMixin, ClaripyDataMixin, SimStateStorageMixin, VEX

         # Raise an exception if we're suddenly in self-modifying code
         if (self.project is None or self.project.selfmodifying_code) and self.state.scratch.dirty_addrs:
+            if stmt.len == 0:
+                # We don't know how long this instruction is.
+                # Conservatively relift in case it overlaps a dirty
+                # address.
+                raise errors.SimReliftException(self.state)
             for subaddr in range(stmt.len):
                 if subaddr + stmt.addr in self.state.scratch.dirty_addrs:
                     raise errors.SimReliftException(self.state)

rickyz avatar Oct 07 '24 05:10 rickyz

Thanks for the patch. Do you want to submit a fix (and ideally, a test case as well) as a PR to angr?

I'll look into patching pyvex when my time allows.

ltfish avatar Oct 07 '24 05:10 ltfish

Sure thing, I sent a PR for the dirty address check at https://github.com/angr/angr/pull/4935 - thanks!

rickyz avatar Oct 07 '24 07:10 rickyz