binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Add a trait/derive macro for transforming Rust type definitions into Binja `Type` objects

Open mkrasnitski opened this issue 10 months ago • 3 comments

Supersedes #4545.

This time around I was able to see this work through to a good enough state to not publish as a draft. Obviously, I would very much like feedback on this, as introducing a proc-macro into the codebase adds a clear extra maintenance burden that normal Rust code doesn't come with.

Overview

This PR adds the following trait:

pub trait AbstractType {
    const SIZE: usize = std::mem::size_of::<Self>();
    const ALIGN: usize = std::mem::align_of::<Self>();
    fn resolve_type() -> Ref<Type>;
}

Implementing this trait allows creating a Type object corresponding to the layout and representation of whatever type implements it. For example, take the following type:

#[derive(AbstractType)]
#[repr(C)]
struct S {
    first: u64,
    second: u8,
    third: i32
}

Calling S::resolve_type() produces the following type (shown after defining it in a view):

image

Derive macro

The accompanying derive macro is very convenient for automatically generating implementations of the AbstractType trait, as long as the deriving type meets the following requirements:

  • For structs/unions, the type must be #[repr(C)], and must have named fields.
  • For enums, the type must not be #[repr(C)] but instead #[repr(<int>)] (e.g. u8, u64). Also, variants must not contain any data, and must have an explicit discriminant. Note that usize and u128 (and their signed counterparts) are not supported as enum reprs.

Alignment

The derive macro supports #[repr(packed)] and #[repr(align)] on types and will propagate that information into the generated Binja Type (i.e. marking a struct #[repr(packed)] will result in a __packed type in Binja).

Named types

Named types are supported by decorating a field with a #[named] attribute. As an example:

#[derive(AbstractType)]
#[repr(u64)]
enum E {
    One = 1,
    Two = 2,
}

#[derive(AbstractType)]
#[repr(C)]
struct S {
    first: u64,
    #[named]
    second: E,
}

The result of S::resolve_type() is the following type:

image

Note that defining S in a view does not require that E already be defined. As long as E is eventually defined, then clicking on the type of second will redirect to the correct type.

Pointers

Any pointer fields (of type *const T or *mut T) must be decorated with #[width = "<type>"], where the specified type implements AbstractType. Since pointers can change width depending on architecture, users must specify the width ahead of time. (Note: In theory the width should be restricted to only primitive integer types, but I didn't feel like hardcoding that decision at the moment). As an example, if we modify S from above to look like:

#[derive(AbstractType)]
#[repr(C)]
struct S {
    first: u64,
    #[named]
    #[width = "u64"]
    second: *const E,
}

Then we will get the following:

image

Miscellaneous notes

  • I'm not sure if there should be a #[name = "..."] field attribute that allows specifying a name for the type that's different than the name of the Rust type.
  • Since it's likely that pointers will always be of a single size for a given binary view, I wonder if instead of decorating each pointer field with #[width = "..."], it makes more sense to decorate the struct itself with #[pointer_width = "..."] or something similar.
  • Should I add tests to verify that the generated Binja types are what we expect them to be? It's possible that I haven't squashed every possible edge-case bug.

mkrasnitski avatar Apr 10 '24 04:04 mkrasnitski

I'm not going to be comfortable merging this until we have tests to ensure it doesn't go stale...

For sure, I'll look into adding some tests. I probably should have included them from the start since I was using a standalone Rust plugin to test how the generated types would look in the UI. I see that Type implements PartialEq, would this work for my purposes, so that I could test using assert_eq? I don't know how type equality is done in Binja, and what exactly is compared.

Please add the #[name = "..."] thing.

Yeah, this is a good idea in principle. I'm thinking there should be a single attribute here instead of two (named and name), so that the mental model becomes: if a string literal value isn't provided, then the name of the Rust type is used as the default. Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width] attribute (that's not a very good name imo).

I'm not sure I understand how #[repr(align)] works. Could you please provide an example?

Decorating a type declaration with #[repr(align(N))] changes that type's minimum alignment to be at least N. For example, say a struct's normal alignment is 4 bytes. Then, decorating the struct definition with #[repr(align(8))] will change it to be 8-byte aligned. However, decorating it with #[repr(align(2))] will keep the alignment at 4. I also uses this internally in my macro for doing type-mocking of pointer types:

// Assuming we know T
const ALIGN: usize = std::mem::align_of::<T>();
const SIZE: usize = std::mem::size_of::<T>();

#[repr(C)]
#[repr(align(ALIGN))] // invalid syntax - we use `elain` to accomplish this
struct Mock {
    t: [u8; SIZE],
}

We've basically created a type with arbitrary size and alignment, since we can change the values of the SIZE and ALIGN constants. And, since align_of<[u8; N]> == align_of<u8> == 1, we know that align_of::<Mock> == ALIGN since ALIGN is always at least 1.

Is there a way to omit #[named]? Is it possible to fall back on to being named if the type is otherwise unrecognized?

There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.

mkrasnitski avatar May 12 '24 16:05 mkrasnitski

Thanks for the additional feedback. I already mentioned this in Slack, but don't worry about assert_eqing types against each other; so long as the procedural macro is able to generate valid API code that gets checked by cargo when we build, I'll be happy with it.

Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width] attribute (that's not a very good name imo).

Personally I'm fine with name, named, and pointer_width. If you wanna merge name and named, I'm more partial to named since it has to do with named_types.

There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.

What about the opposite then? Add a #[inline]/#[inline_type]

ElykDeer avatar May 15 '24 19:05 ElykDeer

What about the opposite then? Add a #[inline]/#[inline_type]

IMO this feels slightly awkward, since then the default behavior with no attributes would be to create a named type using the Rust type's name, and then providing #[name = "..."] would add a custom name whereas #[inline_type] would remove the name. So basically we have two switches controlling a single property.

Personally I'm fine with name, named, and pointer_width. If you wanna merge name and named, I'm more partial to named since it has to do with named_types.

For some reason, I think if we wrap all three in a #[binja(...)] attribute it might actually look fine. So, it would be #[binja(named)], #[binja(name = "...")], and #[binja(pointer_width = "u64")]. Then, it's kind of like those attributes belong to the binja namespace, whereas when they nakedly decorate a field they feel a bit out of place. Is binja alright for this or would binaryninja be preferred? I don't know how hard y'all stick to not using the Binja nickname in user-facing code/docs.

Btw, should the value of pointer_width be changed to an int literal instead of a string literal? That way the "width" can be interpreted more literally, and it seems that Binja supports pointers of arbitrary width.

mkrasnitski avatar May 15 '24 21:05 mkrasnitski

Alright, I think all the behavioral changes are taken care of. To briefly demonstrate: a type that derives AbstractType might now look like this:

#[derive(AbstractType)]
#[repr(u64)]
enum E {
    One = 1,
    Two = 2,
}

#[derive(AbstractType)]
#[repr(C)]
#[binja(pointer_width = 8)]
struct S {
    first: u64,
    #[binja(named)]
    second: *const E,
}

What remains to be done is adding documentation and tests. Ideally I would like #5435 to be resolved and then rebase on top of it, however that might take some time so in the meantime I'll go with the original suggestion of adding to examples/ since those aren't run in CI.

mkrasnitski avatar May 24 '24 07:05 mkrasnitski

@KyleMiles This PR should be ready for re-review now. In total, the new changes since you last reviewed are the commit range 98fde41 through 9a23a8e (comparison).

In summary:

  • Added support for 128-bit reprs on enums
  • Switched field attributes to use the #[binja(...)] format instead of raw paths
  • Added documentation/comments to macro implementation
  • Added a dev guide for using the macro (rendered)
  • Added tests in the form of an abstract-type example

The only thing missing at this point is support for fields of type void *, however I'm not sure how to support it given that raw void is not a valid type for a field. Additionally, the documentation for std::ffi::c_void explains that void corresponds to () but void * corresponds to*{const,mut} c_void rather than *{const,mut} ().

mkrasnitski avatar Jun 11 '24 00:06 mkrasnitski

I have discussed this with @mkrasnitski and the plan forward for this is as a separate crate. We need to actually version the binaryninja crate with an attached ABI version (i.e. 0.1.0-69) before this can be split off.

We should aim to have a myriad of helper crates instead of maintaining large effort abstractions/helpers in this repository. This will allow greater flexibility and a clear line between where the API ends and the helpers begin, something that is not clear within our python bindings.

We are going to keep this PR open until the above is at-least partially completed.

emesare avatar Jul 29 '24 11:07 emesare