miasm
miasm copied to clipboard
Required steps to integrate maybe-alias analysis into the symbolic execution engine
Heya guys!
I was discussing together with @fvrmatteo the following case: By default, Miasm's memory model assumes that symbolic memory addresses do not aliase. As a result, the symbolic execution engine never takes the jump since @64[RAX]
can never be 0x2
.
MOV QWORD PTR [RAX], 0x1
MOV QWORD PTR [RBX], 0x2
MOV RCX, QWORD PTR [RAX]
CMP RCX, 0x2
JZ loc_20
To handle cases like that, I wrote a maybe-alias analysis which tells me possible code locations that might aliase. Then, I can handle it manually by mapping those to the same values, like
sb = SymbolicExecutionEngine(lifter)
sb.symbols[ExprMem(ExprId("RAX", 64), 64)] = ExprInt("alias", 64)
sb.symbols[ExprMem(ExprId("RBX", 64), 64)] = ExprInt("alias", 64)
In other words, I can handle both cases individually: By default, they will never aliase. If I map them to the same value, we handle the aliasing case.
However, I would like to build sth such that the symbolic execution stops at a conditional branch at the end of the basic block. It might look similar to the following:
IRDst = @64[maybe_aliase(RAX, RBX)] == 0x2 ? ... : ...
I'm aware that Miasm doesn't support this right now. However, I am currently trying to figure out would be required/the best way to implement it on my own. I think I would have to introduce a maybe_aliase
operator, perform some analysis beforehand and propagate it somehow during symbolic execution. Would you guys agree or do you even think there might be an easier way to realize that?
Best,
mrphrazer
Hi @mrphrazer & @fvrmatteo ! Glad to hear from you guys!
Ok, let me rephrase, to be sure I understand correctly what you proposed, and maybe add some precision.
- Let's add a custom operator, say
MayAlias
. This operator will take n arguments, which are expressions of base address of may alisaing memory. - in this operator, let say that the order is a bit important. To be precise, we will distinguish the first argument from others: this is a detail which will be useful in the following. The first argument will give a hint on the current manipulated memory, the others are a bunch of aliases.
- So if memory based on
RAX
aliases with memory based onRBX
andRCX
let say that we will initialize the engine a bit like you proposed:
sb = SymbolicExecutionEngine(lifter)
sb.symbols[ExprId("RAX", 64)] = ExprOp("may_alias", ExprId("RAX", 64), ExprId("RBX", 64), ExprId("RCX", 64))
sb.symbols[ExprId("RBX", 64)] = ExprOp("may_alias", ExprId("RBX", 64), ExprId("RAX", 64), ExprId("RCX", 64))
sb.symbols[ExprId("RCX", 64)] = ExprOp("may_alias", ExprId("RCX", 64), ExprId("RAX", 64), ExprId("RBX", 64))
Note the arguments order in the initialization.
- let's take an example simplier than yours first
MOV QWORD PTR [RAX], 0x1
MOV RCX, QWORD PTR [RAX]
CMP RCX, 0x2
JZ loc_20
During the evaluation of the first line MOV QWORD PTR [RAX], 0x1
, we will first evaluate RAX
to evaluate the memory pointer. The actual engine will return ExprOp("MayAlias", RAX, RBX, ECX)
(say MayAlias(RAX, RBX, RCX)
to simplify). So it will evaluate the final expression:
ExprMem(ExprOp(RAX, RBX, RCX), 64) = 1
If we use the current memory assignment hook (eval_assignblk) in the Symbolic engine to customize the "assignment", we could do the nexts steps:
- extract the first operand (
RAX
) and the remainings (RBX
/RBX
). This is why the arguments order may be important here. - first, the operator
ExprOp
inform us that the "original" memory is pointed onRAX
, so we can do a "real" assignment like:
ExprMem(RAX, 64) = 1
This will in return link ExprMem(RAX, 64) to 1 in the "classic" symbolic engine.
- Next, we can add effects on the others aliasing memories, @rbx @and rcx. So we may also:
- ethier invalidate every memories based on rbx and rcx in the current state
- or maybe modify them to someting like:
ExprOp("maybe_erased", 1, Oldvalue(XXX))
Next, we will evaluate the mov. So the code will evaluate @64[RAX]
. In the case, we will first evalueate RAX which will give us MayAlias(RAX, RBX, RCX)
. So ExprMem(MayAlias(RAX, RBX, RCX))
. In this case, we may use the eval_exprmem
hook in the symbolic engine to simplify to ExprMem(RAX) and then to use the classical enginel to recover the 1 previously set. So what is interesting here is that even if we support aliasing, we may resolve 'simple' cases.
So rcx is 1, so the CMP is not taken.
Ok, now let's apply this to your example:
MOV QWORD PTR [RAX], 0x1
MOV QWORD PTR [RBX], 0x2
MOV RCX, QWORD PTR [RAX]
CMP RCX, 0x2
JZ loc_20
The first mov
will have the same side effects as before:
-
ExprMem(RAX) = 1
in the symbolic engine - erase of all
RBX
/RCX
based memory (or modified)
The next line, MOV QWORD PTR [RBX], 0x2
will do the same behavior:
-
ExprMem(RBX) = 2
in the symbolic engine - erase (or modification) of all
RAX
/RCX
based memory (so the@RAX
is modified)
now, when evaluating the line MOV RCX, QWORD PTR [RAX]
we will have:
- evaluation of the
@64[RAX]
- so first evaluation of
RAX
which will giveMayAlias(RAX, RBX, RCX)
- then evaluation of
ExprMem(MayAlias(RAX, RBX, RCX))
which will give using the previous custom rule,ExprMem(RAX)
which will result in erased value, or modified valueModified(1, 2)
, depending on what we decided previously - the
CMP
will use this as operator, and then this will be feed in theJZ
.
Are those steps seems legit to you guys? Did I understand correctly your solution, or have I a drift from what you proposed? Are you ok with this, or have you remarks or counter example which blows up this?
It seems that this "poc" might be done using the current engine, correct?