abi-aa icon indicating copy to clipboard operation
abi-aa copied to clipboard

Consider an ABI extension to define metadata for binary analysis

Open smithp35 opened this issue 1 year ago • 10 comments

With increasing adoption of tools like BOLT and use of binary analysis by the Linux kernel, there may be demand for additional metadata to aid control flow discovery.

Examples include:

  • Identifying static linker generated stubs/veneers/thunks.
  • Identifying function pointer destinations.

If there are to be metadata added to toolchains such as LLVM and GCC, it would be useful to document these in the ABI to help with interoperability of tools.

This issue is a placeholder for further discussion.

smithp35 avatar Nov 13 '24 15:11 smithp35

Other useful pieces of info for binary analysis reconstruction of control flow graphs include:

  • a list of possible targets of indirect jumps, for example as generated by lowering of switch statements to jump tables.
  • an indication of whether a called function is considered "no-return", and hence the binary analysis tool should assume control flow cannot continue past a specific function call.
    • https://github.com/llvm/llvm-project/issues/115154 indicates this would help binary analysis tools to reconstruct CFGs better.
    • https://nebelwelt.net/files/17CC.pdf at the end of section "4.2 Function boundaries recovery" also clearly documents the need to identify noreturn functions well during reconstruction of a CFG in binary analysis tools.

kbeyls avatar Nov 14 '24 10:11 kbeyls

  • https://github.com/llvm/llvm-project/issues/100096 it's tricky in general to determine if a GOT entry points to data or code, and there exists at least one case where they can alias through pointer-to-end-of-array. This can probably be solved well enough heuristically for the specific case seen there (glibc bfd linked static binary crash on startup); but I wonder if the problem could exist more generally.

peterwaller-arm avatar Nov 14 '24 11:11 peterwaller-arm

  • More architectures are adding jump table information into ELF (https://github.com/llvm/llvm-project/pull/112606). It will be great to have a standard ELF extension that covers them all. From the BOLT perspective, we need to know jump table boundaries, instructions involved in forming the jump table address, and the indirect jump location(s). Note that it's possible for jump tables to overlap. Additionally, we can expect the jump table address to be stored and loaded from the stack.
  • Indirect/computed goto extension for C/C++ also uses indirect jumps and the mechanism deviates from a typical switch/jump table implementation.

maksfb avatar Nov 14 '24 20:11 maksfb

One of the extension that could be potentially utilized now to identify BB and Funcs from LLVM side, could be Basic Block Address Map:

https://llvm.org/docs/Extensions.html#sht-llvm-bb-addr-map-section-basic-block-address-map

The basic block address map was used in a similar to BOLT(compiler/linker level but non for binary level) tooling to let correctly map profiled sampled information related to Funcs/BBs. The format of this map and data incorporated into binary should not have a significant impact in terms of the size or perf. And it could be used as a hint as well. I have been thinking to make it work for the BOLT if the section '.llvm_bb_addr_map' is presented in the binary.

The disadvantage - this BBAddrMap is presented only for LLVM/CLANG.

Regarding jump tables, maybe considering something like: https://llvm.org/docs/Extensions.html#sht-llvm-jt-sizes-section-jump-table-addresses-and-sizes

alekuz01 avatar Nov 18 '24 11:11 alekuz01

A slightly different topic, but it seems also related to the ABI-like agreement on BOLT binary rewriting and stripping tools expectations. Issues reported on that: https://github.com/llvm/llvm-project/issues/56738 https://github.com/llvm/llvm-project/issues/89336 https://github.com/llvm/llvm-project/issues/85796 The RFC effort to address some issues with new sections in the end and old sections in place confusing stripping tools https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674

ilinpv avatar Nov 18 '24 21:11 ilinpv

Thank you for all the comments and suggestions. It looks like there is sufficient interest to go forward with this. Most likely in the form of an ABI extension that can be worked on incrementally with an implementation. We'll have more to say next year.

smithp35 avatar Dec 11 '24 11:12 smithp35

Another reference we can take ideas from: https://github.com/google/android-riscv64/issues/68

appujee avatar Dec 11 '24 17:12 appujee

Unfortunately I only just now discovered this discussion. Anyway:

I started prototyping extended metadata for aarch64 jumptables: https://github.com/MatzeB/llvm-project/tree/staging-matthiasb-jump_table_info and BOLT folks will likely work on a prototype parser for this section so we can get rid of the -fno-jump-tables that we currently use in our builds to support BOLT aarch64.

I also posted on LLVM discourse wondering if folks are willing to accept enhancements/alternative formats to the existing llvm_jump_table_sizes section: https://discourse.llvm.org/t/extending-llvm-jump-table-sizes/85131

MatzeB avatar Mar 12 '25 00:03 MatzeB

Thanks for letting us know. We're still at an early stage for looking at this. I'll make sure the lead engineers @ilinpv and @peterwaller-arm are aware.

smithp35 avatar Mar 12 '25 09:03 smithp35

A specification of veneers to support binary analysis tools is available for review https://github.com/ARM-software/abi-aa/pull/333, your feedback is much appreciated.

ilinpv avatar Jun 11 '25 17:06 ilinpv