binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Rust: refactor MLIL and HLIL instructions

Open mkrasnitski opened this issue 1 year ago • 4 comments

This took like a week but the end result I think is a lot more idiomatic and concise.

Here's a summary of changes:

  1. Change MediumLevelILInstruction into a struct with function, address, and kind fields. This factors the function and address fields out of the enum variants. The same thing applies to HighLevelILInstruction.
  2. Implement lifting inside the {Medium,High}LevelILInstruction::lift method rather than deferring to lift methods on each variant.
  3. Remove field getters on each variant and make their fields public. The motivation here is that these getters would perform lifting of their corresponding field, which means using them invokes a kind of "partial" lifting which feels weird. Better to just go through the {Medium,High}LevelILLiftedInstruction structs.
    • Some of the fields were tuples that I expanded out into named fields. For example, list metadata fields included the index of the first element as well as the length. I did my best to give some good names, sticking to the num_* and first_* template, but probably some could be named something better.
  4. Move the operands method onto the lifted instructions, as it called the now-removed field getters. Also, it returns an owned Vec of operands, each of which also own their data. This introduces some extra clones but seems a bit easier to handle for users, and avoids exposing the custom OperandIter type publicly.
  5. Instantiate instructions directly from the list of raw operands rather than using new methods. Then, delete the constructors. In the end, this lets operation.rs just be a big collection of struct definitions.
  6. The OperandList type was duplicated between MLIL and HLIL, so I made a generic version which can hold either type of function. Also, the implementation of ExactSizeIterator for the type was wrong because the value of self.remaining was being decreased 4 at a time and would be out of sync with the actual number of values given out.

Some miscellaneous changes:

  • Rename the ConstantData instruction variants to just ConstData to avoid confusion with the types::ConstantData struct.
  • Add LiftedJump and LiftedLabel types for HLIL. These were removed halfway through the original HLIL PR but I don't know why.

mkrasnitski avatar Feb 12 '24 05:02 mkrasnitski

One solution is to remove the non-lifted versions and just have a single type of instruction. Having a single Kind enum then won't be much of an issue. It seemed in the original PR there was a want for an easy and fast version of the IL instructions (hence the non-lifted versions, which also implement Copy), so it's worth considering how much of a performance tradeoff working with just lifted instructions really is. Some of the variants are recursive and store whole instructions within them, and I don't know how large a single instruction can get, or how many sub-nodes it can contain. However, it's worth noting that the Python bindings only deal with fully-lifted instructions.

As for your first concern, the HighLevelILInstruction::new constructor is pub(crate) and so the change to take Ref directly is purely internal. Is this what you're referring to, or are there places where calling .to_owned() is now necessary?

EDIT: Also worth noting that HighLevelILOperand no longer exists, it's only HighLevelILLiftedOperand now (likewise for MLIL). I debated keeping the name the same, but felt there should be a clear separation between the "raw" unlifted instructions and the fully lifted ones.

mkrasnitski avatar Feb 12 '24 20:02 mkrasnitski

I missed that HighLevelILInstruction::new is crate-internal. You can disregard that comment. As for removing Lifted or non-lifted, perhaps we should reconsider that later. This PR should be good as-is regardless if we decide to cull one of those later.

ElykDeer avatar Feb 12 '24 21:02 ElykDeer

I'll leave it to @plafosse to decide if we should merge this before code freeze.

ElykDeer avatar Feb 12 '24 21:02 ElykDeer

Let me know if I should squash the extra bugfix commits.

mkrasnitski avatar Feb 23 '24 04:02 mkrasnitski

Thank you again for this awesome PR! Sorry for the delay on getting it merged.

ElykDeer avatar Mar 18 '24 21:03 ElykDeer