binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Better support for translating pointer encoding to an address (with optional metadata)

Open fabianfreyer opened this issue 3 years ago • 3 comments

What is the feature you'd like to have? Pointers are not necessarily integers. And they can be bigger than the addresses. I'd like Binja to support such pointer types. In particular, I'd like the Architecture class to be extended to support translating a given pointer encoding to an address and optional metadata; and BinaryNinja to use this when computing the address of a pointer for references, value set analysis, and other analyses, while optionally displaying the metadata (e.g. in a mouseover popup, or making it available to a DataRenderer). Also, please add an inverse translation function taking an address and metadata to construct a pointer encoding.

This would help introduce a foundation on which to build lifting for instructions that manipulate pointers and their metadata.

Is your feature request related to a problem? My feature request is related to the problem of reverse engineering binaries with tagged pointers, PAC, and other "weird" pointers.

Are any alternative solutions acceptable? Alternative solutions include rewriting the binary to strip pointer metadata. However, this is not an acceptable solution, since it discards data, and pointers that contain metadata and don't fit into the current model of "pointers are integers containing only an address" are not used for analysis.

Additional Information: Pointers are not necessarily integers that contain an address, although this is usually the case for classic architectures. However, this has changed with the introduction of new CPU features. For example:

  • ARMv8 defines Top-Byte-Ignore (TBI). This allows storing arbitrary metadata in bits 63..56 of a pointer.
  • For amd64, the Upper Address Ignore feature was added. Bits 63...57 contain arbitrary metadata.
  • ARMv8.3 introduced Pointer Authentication Codes (PAC). These store a pointer authentication code and optional tag in the top bits of the pointer. The sign bit of the pointer is stored in bit 55. The bits 63..56, and 54..va_size (va_size is implementation defined) store the metadata.
  • Armv8.5-A introduces Memory Tagging Extensions (MTE). This stores a tag in the lower nibble of the top byte, and builds on TBI.
  • In CHERI, pointers (aka capabilities) are 128-bit structures (129 bit, to be precise, although the extra bit is not stored in memory), while the address of the pointer is stored in bits 63..0 and the metadata is stored in bits 127..64.

I'm not sure how this interacts with segmented architectures, but probably applies there too somehow.

Useful Resources

  • @gankra has a really nice blog post, which - albeit rust-specific - gives a pretty good introduction into why pointers aren't just addresses
  • @saaramar has a very good slide sets on CHERI and MTE (the talks are good too)
  • https://www.amd.com/system/files/TechDocs/24593.pdf Section 5.10

fabianfreyer avatar Apr 01 '22 19:04 fabianfreyer

You are absolutely right, and this is something we know we need to do. I don't think we can fully support things like segmentation (see: #936) without it, either. Unfortunately, the assumption that pointer size == address size is baked into everything right now, so going back and abstracting it out is going to be a HUGE undertaking.

So, thanks for the (very well written/sourced) issue, and we'll definitely keep it open and tracked. But, we definitely don't have an ETA on this and we're unlikely to get to this for some time. Sorry. 😞

fuzyll avatar Apr 02 '22 20:04 fuzyll

In the mean time, to support macOS on arm64e we could be happy with the stripping off of the PAC/Tag bits (this is done for analysis pointers but not for manually user guided pointers). Perhaps now that we have pointer rendering settings, this is the right time to show the "raw pointer" or "logical pointer" (stripped of excess bits).

rickmark avatar Aug 06 '24 20:08 rickmark

In my mind, this can be modeled as two problems: Pointers are not necessarily the same size as address_size, but also pointers are not necessarily the same encoding as an address.

As of 5.1.7515-dev and the closing of #2774, the "size" half of this should be done. What remains is the "encoding" half.

Updated the name of the issue to better reflect the work that remains on this.

fuzyll avatar Jun 16 '25 20:06 fuzyll