miasm
miasm copied to clipboard
Stateful SMT Memory Model
Hi!
This PR introduces a stateful SMT memory model.
What doe this mean?
Similar to SSA, we can make memory reads/writes stateful and use it for SMT-based reasoning.
For this, we have a memory variable M
as well as two operations:
-
mem_read(memory_state, address, size_to_read)
-
mem_write(memory_state, address, value, size_to_write)
Common memory access patterns can be rewritten as follows:
# ebx = @32[eax]
ebx = mem_read(M, eax, 32)
# @32[eax] = ebx
M = mem_write(M, eax, ebx, 32)
Afterwards, SSA can be applied:
# ebx = @32[eax]
ebx.0 = mem_read(M, eax, 32)
# @32[eax] = ebx
M.0 = mem_write(M, eax, ebx.0, 32)
Afterwards, we add every IR instruction to the SMT solver instead of only jump conditions. This allows the SMT solver to analyze and reason about the whole program space (often in a much faster manner, see above).
Why is this useful?
The current symbolic execution/z3 memory model does not contain any form of state and is prone to aliasing. Basically, this PR enables us to constraint memory values for any arbitrary state. This can be useful in different contexts. I will discuss a few of them:
Memory Aliasing
Given the following code and the initial symbolic map: {rbx: rbx, rax: rax}
mov [rax], 0x33
mov [rbx], 0x44
mov rcx, [rax]
Then, rcx
will always contain 0x33
, since the SE and the SMT solver are not aware about potential aliasing issues. However, in the stateful memory model, the SMT solver is able to find a solution for both cases and decides own its own what is more likely the case.
Symbolic Execution
Large pieces of code can slow down symbolic execution enormously. Take for instance the following example. For this, symbolic execution takes ~ 90 seconds and the SMT solving itself ~ 1 second. Instead, in the SMT-based memory model, the whole process takes nearly ~ 1 second. What is happening? We avoid symbolic execution. Instead, the SMT solver takes as input the whole basic block slice (instead of only the jump condition) and decides on its own which parts are required to find a solution and which parts can be omitted. As a result, a satisfiable model is equivalent to a whole slice to all register and memory states that are responsible for its decision.
from z3 import Solver, sat, BitVec
from miasm.analysis.machine import Machine
from miasm.analysis.ssa import SSABlock
from miasm.expression.expression import *
from miasm.expression.simplifications import expr_simp_high_to_explicit
from miasm.expression.simplifications_stateful_memory import rewrite_memory
from miasm.ir.translators.z3_ir import TranslatorZ3
# init
code = "554889e5897dec8975e88955e48b45ec8945fc8b45ec01c08945fc8b45e82b45fc8945ec8b55e88b45e401d08945fc8b45fc3145e48b45e82b45ec8945fc8b55fc8b45e801d08945ec8b45fc0faf45e48945e8c745fc000000008b45fc3145e88b45e83345ec8945fc8b45e82b45ec8945ec8b45e40faf45e88945e4c745e4000000008b55fc8b45e801d08945e48b45e40faf45ec8945e88b45e82945fc8b45ec0faf45fc8945e48b45e42b45fc8945e88b45ec3345e48945e88b45e801c08945e88b45ec01c08945fc8b45fc0145e88b45e42b45ec8945ec8b45fc2b45ec8945ec8b45e40faf45ec8945e48b45ec3145fc8b45e82b45fc8945ec8b45fc0faf45ec8945e88b45ec3345e48945fc8b45e42b45e88945ec8b55e48b45fc01d08945e88b45e43345e88945fc8b45ec3145e88b45fc2b45e48945e48b45fc0faf45e48945fc8b45ec3145e48b45e40faf45e48945e88b45ec2945fc8b45e83345ec8945e48b45e80faf45fc8945e88b45fc3345e48945e88b45e83345fc8945ecc745fc000000008b45e43145fcc745e4000000008b45fc0faf45fc8945ec8b45e43145fcc745ec000000008b45e40faf45fc8945e48b45fc3345e48945ecc745ec000000008b55ec8b45e801d03d390500007507b801000000eb05b8000000005dc3".decode(
"hex")
start_addr = 0x0
# init binary container
architecture = "x86_64"
m = Machine(architecture)
mdis = m.dis_engine(code)
ira = m.ira(mdis.loc_db)
asm_cfg = mdis.dis_multiblock(start_addr)
# IRA cfg
ira_cfg = ira.new_ircfg_from_asmcfg(asm_cfg)
# make flags explicit
ira_cfg.simplify(expr_simp_high_to_explicit)
# rewrite memory expressions
ira_cfg.simplify(rewrite_memory, rewrite_mem=True)
# transform block into SSA
start_loc_key = mdis.loc_db.get_offset_location(start_addr)
ssa = SSABlock(ira_cfg)
ssa.transform(start_loc_key)
ira_block = ira_cfg.get_block(start_addr)
# z3 solver
solver = Solver()
translator = TranslatorZ3(loc_db=mdis.loc_db, stateful_mem=True, mem_size=ira.IRDst.size)
# translate all IR expressions
for assignblock in ira_block:
for expr in assignblock:
expr_z3 = translator.from_expr(assignblock.dst2ExprAssign(expr))
solver.add(expr_z3)
# constraint inputs
irdst = ExprId("IRDst", 64)
rdi = BitVec("RDI", 64)
rsi = BitVec("RSI", 64)
rdx = BitVec("RDX", 64)
solver.add(translator.from_expr(irdst) == 0x1d3)
solver.add(rdi == 0)
# check sat/get model
if solver.check() == sat:
print "sat"
model = solver.model()
print "RDI: {}".format(model[rdi])
print "RSI: {}".format(model[rsi])
print "RDX: {}".format(model[rdx])
Other
Other use cases can be exploit development (given a path to a exploitable vulnerability, can we craft an input such that our final memory contains the following shell code <...>), checks for integer overflows, ...
Open Problems
As stated in #1004, a questions remains how the memory rewrite pass can be implemented in a clean manner. The current solutions patches the simplify function in AssignBlock
and IRBlock
and shouldn't stay as it is.
Sorry for the delay @mrphrazer Once again, you give us really interesting ideas here. The time we are taking here is not because we are not interested, it's because we need time to understand every parts and ideas embedded in your PR :) Thank you for the contribution :smile:
As @serpilliere remarks, we didn't give you feedback for now. Here is my current remarks / questions:
- Regarding the integration, I'm not sure adding it to the IR core is the cleanest / easiest path. I would rather implement it as a transformation of an IRCFG to a new one where memory accesses have been rewritten. Thus, one can use the common transformation / simplification and then enjoy this transformation.
- Following the aforementioned remark, for now (and at least for an easier integration and test), I would rather see this feature as a separated "module". I mean it seems to me we can implemented it in a separate analysis pass, and with separate object in Z3 translator, etc. It would allow us to test it, compare it with the current memory model. And IMHO, even if this memory model seems to be suited for certain use case, it might not be for other reversing purposes. So keeping it separate give the user the final choice.
- I'm not sure to understand how
Phi(M.0, M.1)
will be handled. I do agree it will be very interesting among a given path, but I still have issue regarding how it would handle a graph. Do you have any idea? Or maybe I'm out of context of this memory model uses?
Code remarks: thanks for commenting it out, it is so much easier to review :)
You might be interested in the Core Theory from the Binary Analysis Platform.