Few questions regarding the DSEengine
Hello Miasm !
First of all, thanks for this wonderful project, I use it almost every day and I wanted to say to you that the more I understand/discover it, the more I love it.
I've got a couple of questions regarding the DSEengine. I wrote this small shellcode:
nop
mov eax, 0x12345678
add eax, 0xc
xor eax, 0x89abcdef
mov dword ptr [esp - 4], eax
nop
And I used it with the DSEengine in this small example of code:
from miasm.core.interval import interval
from miasm.core.bin_stream import bin_stream_str
from miasm.analysis.dse import DSEEngine
from miasm.analysis.machine import Machine
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, PAGE_EXEC
from miasm.expression.simplifications import expr_simp
from miasm.expression.expression import ExprId, ExprInt, ExprMem, ExprAff, ExprAssign
def stop(jitter):
return False
bs = bytes.fromhex("90 B8 78 56 34 12 83 C0 0C 35 EF CD AB 89 89 44 24 FC 90".replace(" ",""))
machine = Machine("x86_32")
jitter = Machine("x86_32").jitter()
jitter.vm.add_memory_page(0, PAGE_READ | PAGE_WRITE | PAGE_EXEC, bs)
jitter.add_breakpoint(0x12, stop)
jitter.set_trace_log()
jitter.init_stack()
dse = DSEEngine(machine)
dse.attach(jitter)
dse.update_state_from_concrete()
it = interval([(jitter.cpu.ESP - 0x10, jitter.cpu.ESP)])
dse.symbolize_memory(it)
SYMB_EAX = ExprId("SYMB_EAX", 32)
state = {
ExprId("EAX", 32): SYMB_EAX,
}
dse.update_state(state)
jitter.run(0)
eax = dse.eval_expr(ExprId("EAX", 32))
stack = dse.eval_expr(ExprMem(ExprId("ESP", 32) - ExprInt(4, 32), 32))
print('EAX = {}, stack = {}'.format(eax, stack))
The output of this example is:
EAX = 0x9B9F9B6B, stack = {MEM_0x123fffc 0 8, MEM_0x123fffd 8 16, MEM_0x123fffe 16 24, MEM_0x123ffff 24 32}
I'm note sure why the value returned by dse.eval_expr(ExprId("EAX", 32)) is the concrete one, as I updated the state with a symbolic value for EAX ? Moreover, If I change the code, by updating the state inside a breakpoint, like this:
from miasm.core.interval import interval
from miasm.core.bin_stream import bin_stream_str
from miasm.analysis.dse import DSEEngine
from miasm.analysis.machine import Machine
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, PAGE_EXEC
from miasm.expression.simplifications import expr_simp
from miasm.expression.expression import ExprId, ExprInt, ExprMem, ExprAff, ExprAssign
def start(jitter):
global dse
SYMB_EAX = ExprId("SYMB_EAX", 32)
state = {
ExprId("EAX", 32): SYMB_EAX,
}
dse.update_state(state)
return True
def stop(jitter):
return False
bs = bytes.fromhex("90 B8 78 56 34 12 83 C0 0C 35 EF CD AB 89 89 44 24 FC 90".replace(" ",""))
machine = Machine("x86_32")
jitter = Machine("x86_32").jitter()
jitter.vm.add_memory_page(0, PAGE_READ | PAGE_WRITE | PAGE_EXEC, bs)
jitter.add_breakpoint(0x12, stop)
jitter.add_breakpoint(0x1, start)
jitter.set_trace_log()
jitter.init_stack()
dse = DSEEngine(machine)
dse.attach(jitter)
dse.update_state_from_concrete()
it = interval([(jitter.cpu.ESP - 0x10, jitter.cpu.ESP)])
dse.symbolize_memory(it)
jitter.run(0)
eax = dse.eval_expr(ExprId("EAX", 32))
stack = dse.eval_expr(ExprMem(ExprId("ESP", 32) - ExprInt(4, 32), 32))
print('EAX = {}, stack = {}'.format(eax, stack))
Then I've got the right output for EAX:
EAX = (SYMB_EAX + 0xC) ^ 0x89ABCDEF, stack = {MEM_0x123fffc 0 8, MEM_0x123fffd 8 16, MEM_0x123fffe 16 24, MEM_0x123ffff 24 32}
So my first question is: is it possible to update the state with symbolic values outside of a breakpoint ?
Moreover, I don't understand why the DWORD written on the stack doesn't appears in the variable stack, as I symbolized EAX and call dse.symbolize_memory(). In all my tests, in case of a write in a symbolized memory, neither a concrete value or a symbolic one appear when I call eval_expr(), it's always MEM_XXXXX. Am I doing something wrong?
Thanks !
Hi @stan1slas! Thank you for the support!
For the first case (I mean running from start to end), it seems legit to me that EAX is concrete: Even if you set a symbol on it, the second instruction of your shellcode reset it to 0x12345678, and then does only concrete operations on it.
So it seems legit the returned value is concrete.
Are you ok with this first point?
Well you're right it's legit because of the mov eax, 0x12345678. However in my second example it's basically the same, no? The symbol SYMB_EAX is set on the first nop. I don't understand the differences between the two examples.
I did a quick test by modifying the shellcode to this one:
nop
mov eax, 0x12345678
add eax, 0xc
xor eax, 0x89abcdef
mov dword ptr [esp - 4], ecx
nop
I symbolized ECX like I did in my first example (outside a breakpoint), and I've got the following output:
EAX = 0x9B9F9B6B, ECX = SYMB_ECX, stack = {MEM_0x123fffc 0 8, MEM_0x123fffd 8 16, MEM_0x123fffe 16 24, MEM_0x123ffff 24 32}, So I'm ok with the first point, thanks a lot !
Any idea about the second point ? I read the blog post "Playing with dynamic symbolic execution" and it's exactly what you did with the shellcode:
- symbolize memory with
dse.symbolize_memory(interval([(addr_sc, addr_sc + len(data) - 1)]))) - Then eval an address which gives
(MEM_0x400053^(MEM_0x400052*0x10)).
So I guess I'm missing something important that makes me fail !
I agree with you on the second point. I did the test with the code:
MOV EAX, 1
MOV EAX, 2
MOV EAX, 3
with a breakpoint after the first line or after the second one. In each case, it seems the dse has run one instruction more than the Jitter. So I agree with you on this point: your second code show exactly thos problem
@commial , do you have an idea on that?
Just a quick look, but, if I run your second code, I indeed obtain:
EAX = (SYMB_EAX + 0xC) ^ 0x89ABCDEF, stack = {MEM_0x123fffc 0 8, MEM_0x123fffd 8 16, MEM_0x123fffe 16 24, MEM_0x123ffff 24 32}
But if we dump the state of the symbolic emulator:
>>> dse.symb.dump()
...
@32[0x123FFFC] = (SYMB_EAX + 0xC) ^ 0x89ABCDEF
If we try explicitely:
>>> dse.eval_expr(ExprId("ESP", 32) - ExprInt(4, 32))
ExprInt(0x123FFFC, 32)
>>> dse.eval_expr(ExprMem(ExprId("ESP", 32) - ExprInt(4, 32), 32))
ExprCompose(ExprId('MEM_0x123fffc', 8), ExprId('MEM_0x123fffd', 8), ExprId('MEM_0x123fffe', 8), ExprId('MEM_0x123ffff', 8))
>>> dse.eval_expr(ExprMem(ExprInt(0x123FFFC, 32), 32))
ExprCompose(ExprId('MEM_0x123fffc', 8), ExprId('MEM_0x123fffd', 8), ExprId('MEM_0x123fffe', 8), ExprId('MEM_0x123ffff', 8))
>>> dse.symb.eval_expr(ExprMem(ExprInt(0x123FFFC, 32), 32))
ExprCompose(ExprId('MEM_0x123fffc', 8), ExprId('MEM_0x123fffd', 8), ExprId('MEM_0x123fffe', 8), ExprId('MEM_0x123ffff', 8))
And it's due to the use of dse.symbolize_memory(it).
It's likely not clear enough, but the help of symbolize_memory is "symbolize (using .memory_to_expr) memory areas (ie, reading from an address in one of these areas yield a symbol)".
In other words, reading from a location in this area will always return a given symbol (MEM_XXX), no matter what is actually in this location.
Removing these lines:
it = interval([(jitter.cpu.ESP - 0x10, jitter.cpu.ESP)])
dse.symbolize_memory(it)
We obtain:
EAX = (SYMB_EAX + 0xC) ^ 0x89ABCDEF, stack = (SYMB_EAX + 0xC) ^ 0x89ABCDEF
Which seems to be the expected result :)
What about this one @commial :
from miasm.core.interval import interval
from miasm.core.bin_stream import bin_stream_str
from miasm.analysis.dse import DSEEngine
from miasm.analysis.machine import Machine
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, PAGE_EXEC
from miasm.expression.simplifications import expr_simp
from miasm.expression.expression import ExprId, ExprInt, ExprMem, ExprAff, ExprAssign
def start(jitter):
global dse
print("Breakpoint at: 0x%x" % jitter.pc)
SYMB_EAX = ExprId("SYMB_EAX", 32)
state = {
ExprId("EAX", 32): SYMB_EAX,
}
#dse.update_state_from_concrete()
print("Concrete value: 0x%x" % jitter.cpu.EAX)
print("Symbolic value:", dse.eval_expr(ExprId("EAX", 32)))
dse.update_state(state)
return False
def stop(jitter):
return False
"""
nop
mov eax, 0x12345678
add eax, 0xc
xor eax, 0x89abcdef
mov dword ptr [esp - 4], eax
nop
"""
bs = bytes.fromhex("90 B8 78 56 34 12 83 C0 0C 35 EF CD AB 89 89 44 24 FC 90".replace(" ",""))
#bs = bytes.fromhex("B8 11 11 11 11 B8 22 22 22 22 B8 33 33 33 33 90".replace(" ",""))
open("out.bin", "wb").write(bs)
machine = Machine("x86_32")
jitter = Machine("x86_32").jitter()
jitter.vm.add_memory_page(0, PAGE_READ | PAGE_WRITE | PAGE_EXEC, bs)
jitter.add_breakpoint(0x12, stop)
jitter.add_breakpoint(0x6, start)
jitter.set_trace_log()
jitter.init_stack()
dse = DSEEngine(machine)
dse.attach(jitter)
dse.update_state_from_concrete()
it = interval([(jitter.cpu.ESP - 0x10, jitter.cpu.ESP)])
dse.symbolize_memory(it)
jitter.run(0)
eax = dse.eval_expr(ExprId("EAX", 32))
stack = dse.eval_expr(ExprMem(ExprId("ESP", 32) - ExprInt(4, 32), 32))
print('EAX = {}, stack = {}'.format(eax, stack))
Result:
00000000 NOP
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 01240000 EBP 00000000 EIP 00000001 zf 0 nf 0 of 0 cf 0
00000001 MOV EAX, 0x12345678
EAX 12345678 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 01240000 EBP 00000000 EIP 00000006 zf 0 nf 0 of 0 cf 0
Breakpoint at: 0x6
Concrete value: 0x12345678
Symbolic value: 0x12345684
At this point (in the breakpoint), shouln't the concrete and symbolic be the same?