binaryninja-api
binaryninja-api copied to clipboard
Add a trait/derive macro for transforming Rust type definitions into Binja `Type` objects
Supersedes #4545.
This time around I was able to see this work through to a good enough state to not publish as a draft. Obviously, I would very much like feedback on this, as introducing a proc-macro into the codebase adds a clear extra maintenance burden that normal Rust code doesn't come with.
Overview
This PR adds the following trait:
pub trait AbstractType {
const SIZE: usize = std::mem::size_of::<Self>();
const ALIGN: usize = std::mem::align_of::<Self>();
fn resolve_type() -> Ref<Type>;
}
Implementing this trait allows creating a Type
object corresponding to the layout and representation of whatever type implements it. For example, take the following type:
#[derive(AbstractType)]
#[repr(C)]
struct S {
first: u64,
second: u8,
third: i32
}
Calling S::resolve_type()
produces the following type (shown after defining it in a view):
Derive macro
The accompanying derive macro is very convenient for automatically generating implementations of the AbstractType
trait, as long as the deriving type meets the following requirements:
- For structs/unions, the type must be
#[repr(C)]
, and must have named fields. - For enums, the type must not be
#[repr(C)]
but instead#[repr(<int>)]
(e.g. u8, u64). Also, variants must not contain any data, and must have an explicit discriminant. Note thatusize
andu128
(and their signed counterparts) are not supported as enum reprs.
Alignment
The derive macro supports #[repr(packed)]
and #[repr(align)]
on types and will propagate that information into the generated Binja Type
(i.e. marking a struct #[repr(packed)]
will result in a __packed
type in Binja).
Named types
Named types are supported by decorating a field with a #[named]
attribute. As an example:
#[derive(AbstractType)]
#[repr(u64)]
enum E {
One = 1,
Two = 2,
}
#[derive(AbstractType)]
#[repr(C)]
struct S {
first: u64,
#[named]
second: E,
}
The result of S::resolve_type()
is the following type:
Note that defining S
in a view does not require that E
already be defined. As long as E
is eventually defined, then clicking on the type of second
will redirect to the correct type.
Pointers
Any pointer fields (of type *const T
or *mut T
) must be decorated with #[width = "<type>"]
, where the specified type implements AbstractType
. Since pointers can change width depending on architecture, users must specify the width ahead of time. (Note: In theory the width should be restricted to only primitive integer types, but I didn't feel like hardcoding that decision at the moment). As an example, if we modify S
from above to look like:
#[derive(AbstractType)]
#[repr(C)]
struct S {
first: u64,
#[named]
#[width = "u64"]
second: *const E,
}
Then we will get the following:
Miscellaneous notes
- I'm not sure if there should be a
#[name = "..."]
field attribute that allows specifying a name for the type that's different than the name of the Rust type. - Since it's likely that pointers will always be of a single size for a given binary view, I wonder if instead of decorating each pointer field with
#[width = "..."]
, it makes more sense to decorate the struct itself with#[pointer_width = "..."]
or something similar. - Should I add tests to verify that the generated Binja types are what we expect them to be? It's possible that I haven't squashed every possible edge-case bug.
I'm not going to be comfortable merging this until we have tests to ensure it doesn't go stale...
For sure, I'll look into adding some tests. I probably should have included them from the start since I was using a standalone Rust plugin to test how the generated types would look in the UI. I see that Type
implements PartialEq
, would this work for my purposes, so that I could test using assert_eq
? I don't know how type equality is done in Binja, and what exactly is compared.
Please add the
#[name = "..."]
thing.
Yeah, this is a good idea in principle. I'm thinking there should be a single attribute here instead of two (named
and name
), so that the mental model becomes: if a string literal value isn't provided, then the name of the Rust type is used as the default. Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width]
attribute (that's not a very good name imo).
I'm not sure I understand how
#[repr(align)]
works. Could you please provide an example?
Decorating a type declaration with #[repr(align(N))]
changes that type's minimum alignment to be at least N. For example, say a struct's normal alignment is 4 bytes. Then, decorating the struct definition with #[repr(align(8))]
will change it to be 8-byte aligned. However, decorating it with #[repr(align(2))]
will keep the alignment at 4. I also uses this internally in my macro for doing type-mocking of pointer types:
// Assuming we know T
const ALIGN: usize = std::mem::align_of::<T>();
const SIZE: usize = std::mem::size_of::<T>();
#[repr(C)]
#[repr(align(ALIGN))] // invalid syntax - we use `elain` to accomplish this
struct Mock {
t: [u8; SIZE],
}
We've basically created a type with arbitrary size and alignment, since we can change the values of the SIZE
and ALIGN
constants. And, since align_of<[u8; N]> == align_of<u8> == 1
, we know that align_of::<Mock> == ALIGN
since ALIGN is always at least 1.
Is there a way to omit
#[named]
? Is it possible to fall back on to being named if the type is otherwise unrecognized?
There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.
Thanks for the additional feedback. I already mentioned this in Slack, but don't worry about assert_eq
ing types against each other; so long as the procedural macro is able to generate valid API code that gets checked by cargo when we build, I'll be happy with it.
Bikeshedding on an attribute name is definitely welcome here, as well as for the #[pointer_width] attribute (that's not a very good name imo).
Personally I'm fine with name
, named
, and pointer_width
. If you wanna merge name
and named
, I'm more partial to named
since it has to do with named_types
.
There might be some cases where users might want to keep the type definition inline rather than reference a named type. I think for that reason, always falling back to the Rust name as a default might be too opinionated.
What about the opposite then? Add a #[inline]
/#[inline_type]
What about the opposite then? Add a
#[inline]
/#[inline_type]
IMO this feels slightly awkward, since then the default behavior with no attributes would be to create a named type using the Rust type's name, and then providing #[name = "..."]
would add a custom name whereas #[inline_type]
would remove the name. So basically we have two switches controlling a single property.
Personally I'm fine with
name
,named
, andpointer_width
. If you wanna mergename
andnamed
, I'm more partial tonamed
since it has to do withnamed_types
.
For some reason, I think if we wrap all three in a #[binja(...)]
attribute it might actually look fine. So, it would be #[binja(named)]
, #[binja(name = "...")]
, and #[binja(pointer_width = "u64")]
. Then, it's kind of like those attributes belong to the binja
namespace, whereas when they nakedly decorate a field they feel a bit out of place. Is binja
alright for this or would binaryninja
be preferred? I don't know how hard y'all stick to not using the Binja nickname in user-facing code/docs.
Btw, should the value of pointer_width
be changed to an int literal instead of a string literal? That way the "width" can be interpreted more literally, and it seems that Binja supports pointers of arbitrary width.
Alright, I think all the behavioral changes are taken care of. To briefly demonstrate: a type that derives AbstractType
might now look like this:
#[derive(AbstractType)]
#[repr(u64)]
enum E {
One = 1,
Two = 2,
}
#[derive(AbstractType)]
#[repr(C)]
#[binja(pointer_width = 8)]
struct S {
first: u64,
#[binja(named)]
second: *const E,
}
What remains to be done is adding documentation and tests. Ideally I would like #5435 to be resolved and then rebase on top of it, however that might take some time so in the meantime I'll go with the original suggestion of adding to examples/
since those aren't run in CI.
@KyleMiles This PR should be ready for re-review now. In total, the new changes since you last reviewed are the commit range 98fde41 through 9a23a8e (comparison).
In summary:
- Added support for 128-bit reprs on enums
- Switched field attributes to use the
#[binja(...)]
format instead of raw paths - Added documentation/comments to macro implementation
- Added a dev guide for using the macro (rendered)
- Added tests in the form of an
abstract-type
example
The only thing missing at this point is support for fields of type void *
, however I'm not sure how to support it given that raw void
is not a valid type for a field. Additionally, the documentation for std::ffi::c_void
explains that void
corresponds to ()
but void *
corresponds to*{const,mut} c_void
rather than *{const,mut} ()
.
I have discussed this with @mkrasnitski and the plan forward for this is as a separate crate. We need to actually version the binaryninja
crate with an attached ABI version (i.e. 0.1.0-69
) before this can be split off.
We should aim to have a myriad of helper crates instead of maintaining large effort abstractions/helpers in this repository. This will allow greater flexibility and a clear line between where the API ends and the helpers begin, something that is not clear within our python bindings.
We are going to keep this PR open until the above is at-least partially completed.