amoco
amoco copied to clipboard
Some errors in cfg recovery
Hello,
First, amoco seems really cool. Thanks!
I have an issue, and a question
Issue:
cfg recovery seems to be a bit broken currently. I have a simple 'puts("hello world")' elf which I am using for testing. the lsweep
method recovers most of the basic blocks, as expect however others don't seem to get past the first basic block:
>>> p = amoco.system.loader.load_program('hi32')
>>> z = amoco.lforward(p)
>>> G=z.getcfg()
>>> print G.C
[<grandalf.graphs.graph_core object at 0x7fa70251bf10>, <grandalf.graphs.graph_core object at 0x7fa7024660d0>]
>>> print G.C[0].sV
0.| <node [0x8048380] at 0x7fa70251bb90>
>>> print G.C[1].sV
0.| <node [#PLT@__libc_start_main] at 0x7fa7079073d0>
1.| <node [@__libc_start_main] at 0x7fa7024def50>
Furthermore, some methods error:
>>> z = amoco.fbackward(p)
>>> z.getcfg()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 267, in getcfg
for x in self.itercfg(loc): pass
File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 289, in itercfg
if self.check_ext_target(t):
File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 261, in check_ext_target
self.update_spool(e.v[1],t.parent)
File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 206, in update_spool
T = self.get_targets(vtx,parent)
File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 373, in get_targets
func.map[pc] = mpc
File "/usr/local/lib/python2.7/dist-packages/amoco/code.py", line 33, in map
self.helper(self._map)
File "/usr/local/lib/python2.7/dist-packages/amoco/code.py", line 42, in helper
if self._helper: self._helper(self,m)
AttributeError: _helper
Although interestingly, running it again does not error, it still doesn't recovery the cfg:
>>> z.getcfg()
<amoco.cfg.graph object at 0x7fa702511a90>
(I've also noticed running other cfg recoveries twice gives different results the second time)
I think this these errors stem from certain function calls not getting resolved properly, but I'm not sure:
>>> print n.data
# --- block 0x8048380 ---
0x8048380 '31ed' xor ebp, ebp
0x8048382 '5e' pop esi
0x8048383 '89e1' mov ecx, esp
0x8048385 '83e4f0' and esp, 0xfffffff0
0x8048388 '50' push eax
0x8048389 '54' push esp
0x804838a '52' push edx
0x804838b '6820850408' push #__libc_csu_fini
0x8048390 '68b0840408' push #__libc_csu_init
0x8048395 '51' push ecx
0x8048396 '56' push esi
0x8048397 '687b840408' push #main
0x804839c 'e8afffffff' call *0x8048350
>>> i = n.data.instr[-1]
>>> print i.misc['to']
0x8048350
>>> p.mmap.read(0x8048350, 4)
['\xff%(\x97']
Im happy to provide the binaries, and/or any other info that would be helpful. I spent some time looking around trying to solve it, but am still wrapping my head around how everything is set up.
Question: Im interested in tagging the memory sections with their flags (i.e read, write execute). I originally hacked it onto 'mo', before I noticed that MemoryMap has an unused 'perm' attribute. Is this the correct spot to store that info?
Alternatively, the info is stored in p.bin.Phdr. Would it be cleaner to just query that?
Thanks!
Hi, well cfg recovery is hard. Amoco provides actually 5 classes to help in this recovery, some are very simple (lsweep, fforward, lforward), some try to be more clever (lbackward) by relying on semantics obtained through symbolic execution. In most cases you should either be using the lsweep class or the lbackward class. In your example you are using lforward which only evaluates the PC obtained from a symbolic execution started at the previous block...think of it like that: you are on a leaf of the cfg and want to extend this cfg by finding the targeted block address but you will only be looking at what happens in the parent block and the leaf itself. Whenever there is a PLT jump it is definitely not going to be sufficient. The fbackward and lbackward classes don't stop at the parent, they basically go back in the cfg until sufficient information about the PC is (symbolically) gathered.
Thanks for the _helper bug, it is supposed to be fixed now by commit 599ff0df. The inherited _helper attribute from block was indeed missing in func init.
I will check further when several calls to getcfg() give different results and include this in the test cases and examples to be released soon (2.4.2 awaiting).
Regarding you question on memory rwx flags: yes MemoryMap.perm was planned as a way to map a range of addresses to a rwx permission info that could be used in MemoryMap.read, MemoryMap.write, and CoreExec.read_instruction methods. I have had no need for this feature yet so its still not implemented. Feel free to play !
Awesome, thanks. CFG recovery is definitely hard, but for most of my purposes lsweep is great. The reason I ask about PLT entries is I'm interested in pulling xrefs out with amoco. My current plan was to augment the elf loader so that the various call locations would be able to figure out what they were calling by passing the info up from the imports to the PLT. Is there a better way to do this?
And thanks! I'll definitely play around with adding perms to MemoryMap.
I've noticed another issue with x64 semantics, and was wondering the best way to solve it. The top 32 bits aren't being cleared when 32 bit registers are used. For example:
>>> print b
# --- block 0x400460 ---
0x400460 '31ed' xor ebp, ebp
...
>>> print b.map
rbp <- { | [0:32]->0x0 | [32:64]->rbp[32:64] | }
...
Should be 'rbp <- { | [0:64]->0x0 | }'. I figured out how to modify the semantics of xor to make this correct, but was wondering if there was a better way to do this? It applies to any usage of the 32bit general purpose registers.
Sorry about all the questions. Thanks again for the cool software!
about x64 semantics: damned your right...WTF ! this is even true for a 'mov eax, ebx'. I'm fixing this nonsenseness !
about PLT refs: what is already known to PLT jmps thanks to 'seqhelper' methods (i.e system/linux_x86.py) is the associated external symbol from the ELF structure (see check_sym and its usage). If you'd wanted to see the ref symbol at the call instruction you could add the address of the PLT jmp instruction in the Elf object's functions dict : p.bin.functions[i.address] = ["mysymbol"]. Then the call will be printed as call #mysymbol rather than say call *0x0804xxxx. But this is just a pretty printing trick, not sure its what you want...
The x64 semantics should be better now thanks to commit 6b3757bc, let me know if its ok.
Awesome, thanks for the PLT info. Will look into that too.
x64 semantics stuff looks good in my small test cases. Will let you know if I find any other issues.
Thanks for the fast reply, much appreciated!