miasm
miasm copied to clipboard
Early stop of IR CFG's Symbolic Execution (stops at SHR instruction)
ASM Block
IR CFG
END of se.run_at with step=True
Hi @psyirius ! The explanation is there: https://github.com/cea-sec/miasm/blob/master/doc/ir/lift.ipynb The SHR instruction will generate multiple IR block as:
- if the shifter is not null, it will set the flags
- if the shifter is null, the instruction will not set flags.
So the generate IR reflect this.
Why did we do that?
If you read this doc, it's a bit like thecmovz EAX, EBX
:
Imagine the compilator doeesn't use this optimisation, it will generate a kind of:
if zf then
EAX = EBX
In miasm, the cmovz will generate the very same IR graph as the IF/then. So the positive point is in IR world, using cmovz or if/then has the same difficulty to analyze But you could make the same remark: "why does symb exec stops on cmovz?" This is why :)
Got any quick workaround for the sym exec to reach where RCX is assigned a const value?
Btw se.run_at is not fully executed even after supplying lbl_stop!
Hi If, during the symbolic execution RCX is evaluated to a constant, Miasm will resolve flags and pursue the execution. If you want to do the execution with a symbolic value un RCX, that's getting complicated:
- For example, you can arbitrary split your execution, doing one considereing RCX null, and another considering RCX not null.
- Or maybe you are not interested in the case RCX is null so you can modify the IR code generated by miasm (for shr, shl, rol, ror, ...) to only keep the case where RCX is not null (in this case, you may have invalid flags if one day RCX may be null).
I didn't get your point with the lbl_stop stuff. Can you detail this?
Okay, I get it now!
It stops because of the non-deterministic branching caused by SHR which got unresolved dependencies.
I was wondering why there is a break in control-flow when its not in ASM.
Thank you for the answer.
I wonder if there is any easier workaround to merge that extra generated branch into the states and pursue execution to reach the RCX assignment!
Hi @psyirius ! The explanation is there: https://github.com/cea-sec/miasm/blob/master/doc/ir/lift.ipynb The SHR instruction will generate multiple IR block as:
- if the shifter is not null, it will set the flags
- if the shifter is null, the instruction will not set flags.
So the generate IR reflect this. Why did we do that? If you read this doc, it's a bit like the
cmovz EAX, EBX
: Imagine the compilator doeesn't use this optimisation, it will generate a kind of:
I know this decision was made somewhere early in miasm's development, but I was still not able to find any motivation for it. When doing symbolic execution these IF statements are creating a major annoyance and you have to write your own IR pass that rewrites it into ternary statements.
Obviously for things like rep movsb
there is no way around it, but I feel that Triton's model of using ternary expressions whenever possible is more suitable in practice...
Hi @mrexodia !
How do you model the fact that the flag is modified or not, depending on a condition, for example for the shr
instruction?
I do not think it is important information whether the flag was modified or not. The important information is usually the expression extracted from the flag at a later point in execution (usually a branch).
Similarly with division, of course there is an edge case where you divide by zero but it is unlikely to happen in practice and we usually just assume it is non-zero (in fact I add that as a additional constraint).
Of course I understand the correctness argument to some extent, but this part of miasm is hindering experimentation on “simple” obfuscation schemes for beginners in my experience…
I agree with you on the division part: I think it doesn't add much to add a code that handles the division error in the IR. My argument is that it's a runtime error, and, like for example memory lookup & memory writes, it can generate runtime exception. Adding code at the IR level to check memory accesses and dealing with error is not really interesting here (and we don't do that in the IR).
At an extend point, if one simplification decides that the memory lookup or write can be simplified (ie removed), it's here that we may remove a code that can potentially fail. So maybe the simplification engine has to take the decision. And the problem may be the same for example by wanting to swap 2 memory lookup for memory barriers.
But for the "precision vs simple analyze", I find your point a bit more arbitrary:
The use of rep movsb
or cmovz
or shr
(which may be replaced by a lea
for example), etc totally depends on the compiler and its options, not on the "simplicity of the code". I mean, as you said, a simple hand written code will loop and not use rep movb
.
Masking the difficulty of such patterns, which transform a 'data dependency' into an 'execution flow problem' will still be there in those other cases even if we patch the shr
. And if you want to analyze those codes, you will have to deal with those patterns.
Some times ago, on an old miasm version, a user reported a problem in the simplification engine. In his code snip from its backtrace, I saw that he did the following thing: He re implemented the 'problematic' instruction miasm IR to match his wish. And his which was something like (not real code here):
def shr(ir, instr, dst, src):
return ExprAssign(dst, dst >> src)
Can't we do the below to replace the extra generated IR block?
# something like
# sym = affect(sym) if cond else sym
ExprCond(
cond=cond_used_in_the_branching,
src1=expr_that_is_in_the_generated_block,
src2=org_symbol,
)
Hi @psyirius Currently, the assignment is not an expression, but more likely a statement. If we have:
shr eax, cl
and don't want to have an extra block, to generate the of
for example, we can do this:
of = ExprCond(cond, new_value, of)
which means "ok, if the condition it true, assign new_value to of
else, assign it's original value (which has for result no modification)
This is what we did in the very very first version of miasm for the cmovz for instance.
But it has it's limits. For example, in ARM architecture, each instruction can be executed on a custom condition. Let's take str:
str r0, [r1]
which will store r0 at memory pointed at address r1.
If we have a code C like this:
if (cond) {
tab[x] = a;
tab[y] = b;
} else {
tab[x] = c;
tab[y] = d;
}
you may result with this code:
STREQ R4, [R0]
STREQ R5, [R1]
STRNE R6, [R2]
STRNE R7, [R3]
Question: What kind of IR will you generate? If you generate using the previous trick, your code will be:
ExprMem(R0) = ExprCond(cond, R4, ExprMem(R0))
which will generate dummy read/write on memory.
Using the current IR, miasm generates:
Adding a custom simplification rule (to factor same condition code) you ends with;
For shift/rotates, both approaches have their advantages and pitfalls. Maybe we could add an optional argument on the lifter to choose between ternary expression and ITE blocks for the flags?
Hi @W0ni Yes, that's a good solution! And the location of the option (in the lifter) seems perfect to me!