amoco icon indicating copy to clipboard operation
amoco copied to clipboard

No examples

Open zachriggle opened this issue 10 years ago • 5 comments

This seems like a really great project with a lot of potential uses. For one, I'd like to use it in my pwndbg debugger scripts for GDB in order to simulate execution of a single basic block (to determine the expected path of a conditional branch).

Unfortunately, there do not appear to be any examples which I can base this work off of. Since I'm running in a debugger, I can provide complete state information, and even memory access.

How would I go about simulating a short sequence of instructions, given a known starting register context?

zachriggle avatar May 18 '15 21:05 zachriggle

Hi Zach,

Indeed, I need to add examples from the tests/ directory but its currently a bunch of ugly files there and I'd like to rewrite some examples before merging them into release. One other problem that I have is that my samples are often real binaries/malware which can't be released, so I need to build small toy examples first. Not a big deal but just another task. I agree it would be more helpful than a tl;dr readme...

To answer your question, it depends on the input you provide. If amoco has access to the binary file, and you provide the location of the short sequence of instructions as a Python int/long, you would do the following:

first import necessary stuff:

import amoco

Now, assuming you are analyzing 'example.exe' binary,

p = amoco.system.loader.load_program('example.exe') z = amoco.lsweep(p)

which loads the binary in amoco's memory model, and creates a linear sweep instance that allows to easily disassemble basic blocks at given address. Now when you need to simulate a sequence of instructions at some address (a Python int/long value), either do:

b = z.getblock(address)

to retreive the basic block, and if you want to analyze only the first 3 instructions do:

b = amoco.code.block(b.instr[0:3])

Now b.map holds a mapper object that captures symbolically the "execution" of this sequence of instructions. You can "print b.map" to see the effect on registers and memory.

If you want to only provide the sequence as a "shellcode" Python string, you need to load the amoco.arch module yourself first.

Assuming its x86 for example you would do:

import amoco.arch.x86.cpu_x86 as cpu shellcode = "\x31\xed\x5e\x89\xe1" loc = cpu.cst(0x8048380,32) instr = [ ] while len(shellcode)>0: i = cpu.disassemble(shellcode,address=loc) l = i.length instr.append(i) shellcode = shellcode[l:] loc += l b = amoco.code.block(instr)

Now that we have the block object to be studied lets proceed with "simulation":

Let's assume that we have:

print b

--- block 0x8048380 ---

0x8048380 '31ed' xor ebp, ebp 0x8048382 '5e' pop esi 0x8048383 '89e1' mov ecx, esp print b.map eflags <- { | [0:1]->0x0 | [1:2]->eflags[1:2] | [2:3]->0x0 | ... | } ebp <- { | [0:32]->0x0 | } esi <- { | [0:32]->M32(esp) | } esp <- { | [0:32]->(esp+0x4) | } eip <- { | [0:32]->(eip+0x5) | } ecx <- { | [0:32]->(esp+0x4) | }

Since you are in a debugger you have indeed access to "concrete" values so you would like to replace some symbolic values of registers or even memory locations by their concrete values. To do this you just need to provide a state in the form of a mapper that you can define from the real gdb state:

m = amoco.cas.mapper.mapper() m[p.cpu.eip] = b.address m[p.cpu.esp] = p.cpu.cst(0xff885b38,32) m[p.cpu.mem(p.cpu.esp,32)] = p.cpu.cst(0x1234,32)

for example here I assume that the state before executing/simulating b still has symbolic ebx/ecx/eflags/... but I define eip value to be that obviously of address (note that b.address is of type amoco.cas.expressions.cst), I define also esp value and I put 0x1234 32 bits constant at memory location pointed by esp. The rest of the stack is left symbolic.

Now if I do:

s = b.map<<m print s(p.cpu.ecx) 0xff885b3c print s(p.cpu.esi) 0x1234 print s(p.cpu.eip) 0x8048385 print s(p.cpu.ebp) 0x0 print s(p.cpu.eax) eax

Note that all "values" are of type amoco.cas.expressions.cst so if you want them back to Python int/long just do for example

s(p.cpu.ecx).value 4287126332

If you are now fetching some new instructions based on s(p.cpu.eip) as a new block, you can immediately simulate its execution:

bnext = p.getblock(s(p.cpu.eip).value) snext = s>>bnext.map

Thats for a start :) don't hesitate if you have more questions !

Thanks, Axel

On Mon, May 18, 2015 at 11:44 PM, Zach Riggle [email protected] wrote:

This seems like a really great project with a lot of potential uses. For one, I'd like to use it in my pwndbg https://github.com/zachriggle/pwndbg debugger scripts for GDB in order to simulate execution of a single basic block (to determine the expected path of a conditional branch).

Unfortunately, there do not appear to be any examples which I can base this work off of. Since I'm running in a debugger, I can provide complete state information, and even memory access.

How would I go about simulating a short sequence of instructions, given a known starting register context?

— Reply to this email directly or view it on GitHub https://github.com/bdcht/amoco/issues/10.

bdcht avatar May 19 '15 12:05 bdcht

Is there a way to provide a hook for memory access? I won't know ahead of time all of the addresses which Amoco will need to look up.

zachriggle avatar May 19 '15 18:05 zachriggle

If I understand well, you'd like to provide an interface to a concrete state to a mapper and get the resulting mapper where all (or some) symbolic values have been replaced by their concrete value (cst objects) in the provided state interface. Good idea...

To implement this, I would rely on the eval method already found in all cas.expressions classes and would just implement a new "concrete mapper" as env argument which rather than evaluating the expression in the usual amoco.cas.mapper.mapper class would evaluate the expression out of the gdb state. Basically, this new class 'cmapper' would ensure that for whatever amoco expression e, it is able to return an amoco.expressions.cst object representing the concretisation of e (out of whatever concrete state interface it implements), and then you would only need to do: b.map.eval(cmapper(params)) with possibly params indicating that some symbolic expressions need to stay in a symbolic form.

I will implement the skeleton of cmapper based on this idea so that you'd only need to implement the extraction of concrete values out of the gdb/whatever state.

stay tuned

bdcht avatar May 22 '15 12:05 bdcht

Yeah, basically this allows use as an emulator instead of a strictly symbolic solver.

zachriggle avatar May 22 '15 19:05 zachriggle

I've added a 'csi' attribute directly in the mapper class which (if set) acts as a hook function to implement a concrete state interface. I'll still need to document this with examples but here is the idea :

you define your own function that takes a amoco expression as input and returns a amoco cst expression when input is a reg expression or a Python str if the input is a mem expression. Now you create an empty mapper m as above and just set its csi attribute to point to your function.

bdcht avatar May 24 '15 08:05 bdcht