Retrieve RVA from imports
In 0.9.8 (in contrast to 0.9.6), rabin2 -ri does'nt export the rva of the import inside the PLT, because they are considered symbols. But this method creates some confusion:
- symbols are prefixed with 'imp' which can be problematic if a symbol is named like this. One will overwrite the other..
- imports are indexed by ordinals, so there should be a way to retrieve them using the api
- maybe we should prefix those symbols with 'plt.' instead of 'sym.imp.' ?
- the naming colision still exists, and imho, they should be flagged in a different flagspace.
- 'sym.imp.' to 'plt.' is a good change (for ELF), as long as you don't make PLT a special case
- there isn't a real strong link between the import and the PLT wrapper for it. I've removed ordinals because they weren't normalized, and they would've been harder to generalize
- there is, however, a direct link from a reloc to the import it references (via a RBinImport pointer), and automatic analysis should look at relocs for imports
<eddyb> because imports are just names. most other formats allow you to place a reloc for an import in a lot of places, not just one
<eddyb> r_bin relocs pointing at imports are the only way to represent the PE import table
<eddyb> ELF and Mach-O use relocs pointing to imports for this, you can't represent that as imports with addresses (as you can with PE)
<eddyb> the common system is a very good solution, and I'm glad I managed to implement it
Another conceptual problem here is that r2 handles 'symbols' as exports. and imports as imports. so symbols should not contain imports, because the PLT is not exported. We need to change some concepts or keep it like this and fix and document how this thing works to avoid confussions. Historically. radare and r2 used imports giving the plt address and symbols, to list the exports using the real address of the symbol.
And there another issue with import handling: some compilers (like mingw) add trampoline functions which are basically a jmp [iat_entry], which is not counted as import as it should be (or renommed to reflect import).
Maybe we can use IDA-like system, by flagging IAT entry with something like __imp_function and in the analysis module, if we encounter a jmp [__imp_function], make a function with the name "function".
But I think we have to make difference between imports, exports and symbols and go back to the (improved ?) old system.
Symbols can be used for all symbols added by gcc, which contain function names, etc.
Another option to avoid symbol overlapping would be to use two levels for naming like 'sym.imp.' 'sym.plt.' 'sym.exp.' for imports, plt, exports, ...
It could be a nice idea, and "imports" callback in bin_r plugins could generate proper radare output (i.e. 'f sym.imp.import' for example, and maybe create a "exports" callback too)
Then, the problem of handling trampoline functions generated by mingw should be considered too (maybe in analysis part, I don't see "clean" ways to handle that, since nothing references those trampoline functions in PE files)
:+1: for the 'sym.imp.', 'sym.plt.', 'sym.exp.' idea. Also possibly (?) it would allow to use some 'sym.' extensions for virtual functions from vtables.
Or, another idea, only consider GOT entries or Import Table entries for PE bins as "imports" which are "special" relocs and not just fixups to apply, and then add PLT/wrapper functions information if it is available (under ELF binaries, PLTs are stored into the symbol table), or discover wrapper functions during the analysis of binary, which is unfortunately the only way to handle those wrapper in win32 binaries.
So, I thought about something like this:
- GOT offsets could be stored in flagspace "fs imports"
- wrapper functions and other symbols are renamed are according to the symbol table (and we could even skip prefixes like "sym." or "sym.imp", because imports/exports will have their own reliable flagspace)
- other wrapper functions are discovered and renamed by analysis module which detects trampoline function patterns and check if address belongs to the GOT/Import table (this detection may be implemented via a callback in RBin plugin).
And exports could be put in "fs exports" and contain Export Address Table elements for PE binaries and symbol table except PLT elements for ELF binaries. And, to avoid function name collisions, a solution could be adding a suffix to the function which increments each time the same function is added (for example, something like printf_0).
Also, as discused recently on irc, another option would be to use the following prefix names:
- sym: local symbols
- exp: exported symbols
- imp: imported symbols (generated by relocs and currently under sym.imp namespace)
Symbol name collision is not a monopoly of sym.imp vs sym. issue. A binary can be mangled to contain many symbols with the same name, or names that filtered produce the same name for flags in r2 and that may produce inconsistent results in disassembly.
We should use Sdb inside RBin to store all the symbols there. Looking forward 0.9.8 for this. I'm moving the milestone because we must focus on the sdbization.
Reviving this discussion
@radare what about this one? Any updates on this?
No updates. The rbin refactoring hasnt happened yet.
Closing, nobody will give any relevant feedback and we are in a good shape now