retdec icon indicating copy to clipboard operation
retdec copied to clipboard

fileformat/elf: Prevent creation of duplicate imports while parsing

Open PeterMatula opened this issue 5 months ago • 0 comments

  • Problematic file 6dae046fe2268eba225be5916013b7799ba042c7ca87cd49cda77ae59a34d29f.
  • Objectdump and readelf say there are no symbols, but we found ~50k of them.
  • First ~30k looks ok, so its not all garbage.
  • In the next ~20k, there are a lot of (but not all) same symbols with empty name, zero section index, zero address, etc.
  • However, we have relocations for these values anyway, and a lot of them (~170k).
  • So, we end up with ~20K same symbols each with ~170K relocations -> 3.4B Imports.
  • The problem is, that from program's PoV, nothing is obviously wrong. The addresses are valid in the binary, relocation are into valid sections/segments, etc.
  • We could make some heuristic to fix this concrete situation, e.g. cap that multiplication result, trim it down/skip it for empty-named symbols, etc. But, it would be ugly and who knows what would break. However, I realized that this could be solved more elegantly.
  • The same huge amount of Imports was being created for every symbol, as the values were the same. But we did not catch this and happily created them all, as Imports are not indexed in any way, its just a vector of unique pointers.
  • I added a cache to keep track of the created imports and skip recreating them. This will still create a bunch of empty/garbage symbols and Imports, but it does not break anything, and a huge amount of entries is created only once.
  • To optimize it further the cache works only with string references, which is however a bit error prone. Please check the logic, as my C++ is kinda rusty, and maybe there could be less error-prone solution. However, I don't think the underlying code (vector of unique pointer in Import Table) will change in the future, so it should not really break that easy.

PeterMatula avatar Sep 23 '24 20:09 PeterMatula