amoco Some errors in cfg recovery

Hello,

First, amoco seems really cool. Thanks!

I have an issue, and a question

Issue: cfg recovery seems to be a bit broken currently. I have a simple 'puts("hello world")' elf which I am using for testing. the lsweep method recovers most of the basic blocks, as expect however others don't seem to get past the first basic block:

>>> p = amoco.system.loader.load_program('hi32')
>>> z = amoco.lforward(p)
>>> G=z.getcfg()
>>> print G.C
[<grandalf.graphs.graph_core object at 0x7fa70251bf10>, <grandalf.graphs.graph_core object at 0x7fa7024660d0>]
>>> print G.C[0].sV
0.| <node [0x8048380] at 0x7fa70251bb90>
>>> print G.C[1].sV
0.| <node [#PLT@__libc_start_main] at 0x7fa7079073d0>
1.| <node [@__libc_start_main] at 0x7fa7024def50>

Furthermore, some methods error:

>>> z = amoco.fbackward(p)
>>> z.getcfg()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 267, in getcfg
    for x in self.itercfg(loc): pass
  File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 289, in itercfg
    if self.check_ext_target(t):
  File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 261, in check_ext_target
    self.update_spool(e.v[1],t.parent)
  File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 206, in update_spool
    T = self.get_targets(vtx,parent)
  File "/usr/local/lib/python2.7/dist-packages/amoco/main.py", line 373, in get_targets
    func.map[pc] = mpc
  File "/usr/local/lib/python2.7/dist-packages/amoco/code.py", line 33, in map
    self.helper(self._map)
  File "/usr/local/lib/python2.7/dist-packages/amoco/code.py", line 42, in helper
    if self._helper: self._helper(self,m)
AttributeError: _helper

Although interestingly, running it again does not error, it still doesn't recovery the cfg:

>>> z.getcfg()
<amoco.cfg.graph object at 0x7fa702511a90>

(I've also noticed running other cfg recoveries twice gives different results the second time)

I think this these errors stem from certain function calls not getting resolved properly, but I'm not sure:

>>> print n.data
# --- block 0x8048380 ---
0x8048380  '31ed'           xor         ebp, ebp
0x8048382  '5e'             pop         esi
0x8048383  '89e1'           mov         ecx, esp
0x8048385  '83e4f0'         and         esp, 0xfffffff0
0x8048388  '50'             push        eax
0x8048389  '54'             push        esp
0x804838a  '52'             push        edx
0x804838b  '6820850408'     push        #__libc_csu_fini
0x8048390  '68b0840408'     push        #__libc_csu_init
0x8048395  '51'             push        ecx
0x8048396  '56'             push        esi
0x8048397  '687b840408'     push        #main
0x804839c  'e8afffffff'     call        *0x8048350
>>> i = n.data.instr[-1]
>>> print i.misc['to']
0x8048350
>>> p.mmap.read(0x8048350, 4)
['\xff%(\x97']

Im happy to provide the binaries, and/or any other info that would be helpful. I spent some time looking around trying to solve it, but am still wrapping my head around how everything is set up.

Question: Im interested in tagging the memory sections with their flags (i.e read, write execute). I originally hacked it onto 'mo', before I noticed that MemoryMap has an unused 'perm' attribute. Is this the correct spot to store that info?

Alternatively, the info is stored in p.bin.Phdr. Would it be cleaner to just query that?

Thanks!

Jul 19 '15 06:07 yrp604

Hi, well cfg recovery is hard. Amoco provides actually 5 classes to help in this recovery, some are very simple (lsweep, fforward, lforward), some try to be more clever (lbackward) by relying on semantics obtained through symbolic execution. In most cases you should either be using the lsweep class or the lbackward class. In your example you are using lforward which only evaluates the PC obtained from a symbolic execution started at the previous block...think of it like that: you are on a leaf of the cfg and want to extend this cfg by finding the targeted block address but you will only be looking at what happens in the parent block and the leaf itself. Whenever there is a PLT jump it is definitely not going to be sufficient. The fbackward and lbackward classes don't stop at the parent, they basically go back in the cfg until sufficient information about the PC is (symbolically) gathered.

Thanks for the _helper bug, it is supposed to be fixed now by commit 599ff0df. The inherited _helper attribute from block was indeed missing in func init.

I will check further when several calls to getcfg() give different results and include this in the test cases and examples to be released soon (2.4.2 awaiting).

Regarding you question on memory rwx flags: yes MemoryMap.perm was planned as a way to map a range of addresses to a rwx permission info that could be used in MemoryMap.read, MemoryMap.write, and CoreExec.read_instruction methods. I have had no need for this feature yet so its still not implemented. Feel free to play !

Jul 20 '15 12:07 bdcht

Awesome, thanks. CFG recovery is definitely hard, but for most of my purposes lsweep is great. The reason I ask about PLT entries is I'm interested in pulling xrefs out with amoco. My current plan was to augment the elf loader so that the various call locations would be able to figure out what they were calling by passing the info up from the imports to the PLT. Is there a better way to do this?

And thanks! I'll definitely play around with adding perms to MemoryMap.

I've noticed another issue with x64 semantics, and was wondering the best way to solve it. The top 32 bits aren't being cleared when 32 bit registers are used. For example:

>>> print b
# --- block 0x400460 ---
0x400460   '31ed'               xor         ebp, ebp
...
>>> print b.map
rbp <- { | [0:32]->0x0 | [32:64]->rbp[32:64] | }
...

Should be 'rbp <- { | [0:64]->0x0 | }'. I figured out how to modify the semantics of xor to make this correct, but was wondering if there was a better way to do this? It applies to any usage of the 32bit general purpose registers.

Sorry about all the questions. Thanks again for the cool software!

Jul 22 '15 04:07 yrp604

about x64 semantics: damned your right...WTF ! this is even true for a 'mov eax, ebx'. I'm fixing this nonsenseness !

about PLT refs: what is already known to PLT jmps thanks to 'seqhelper' methods (i.e system/linux_x86.py) is the associated external symbol from the ELF structure (see check_sym and its usage). If you'd wanted to see the ref symbol at the call instruction you could add the address of the PLT jmp instruction in the Elf object's functions dict : p.bin.functions[i.address] = ["mysymbol"]. Then the call will be printed as call #mysymbol rather than say call *0x0804xxxx. But this is just a pretty printing trick, not sure its what you want...

Jul 22 '15 09:07 bdcht

The x64 semantics should be better now thanks to commit 6b3757bc, let me know if its ok.

Jul 22 '15 16:07 bdcht

Awesome, thanks for the PLT info. Will look into that too.

x64 semantics stuff looks good in my small test cases. Will let you know if I find any other issues.

Thanks for the fast reply, much appreciated!

Jul 23 '15 04:07 yrp604

amoco amoco copied to clipboard

Some errors in cfg recovery

amoco
amoco copied to clipboard