binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

IL String Rendering and Auto-detection Disabled for Read+Write Sections/Segments

Open 0bs3n opened this issue 2 years ago • 4 comments

Version and Platform (required):

  • Binary Ninja Version: 3.0.3233-stable, 3.0.3255-dev
  • OS: Manjaro Linux
  • OS Version: Up to date

Bug Description: The detection and IL rendering of ascii strings is dependent on the read/write flags of the segment and section. As far as I can tell, if a segment is loaded with SegmentFlag.SegmentWritable from a view plugin, no string detection is performed, and no string rendering is seen in IL referencing those strings. This is not desirable for many file formats which do not feature a defined distinction between writable and read-only data, such as ELF's .data/.rodata distinction. For example, the bFLT (Binary Flat) format just has one big .data section, with writable and read-only data intermixed, and no markers indicating which is which. This behavior is maintained even if the string data variables are manually defined as char/const char */[].

Confusingly, this behavior regarding segments does not occur if the segment is not mapped at all in the view plugin, and instead is mapped manually from the console once the view (with all the other segments) has been loaded into the binaryninja UI. In this case, issuing a bv.add_user_segment or bv.add_auto_segment with SegmentFlag.SegmentWritable does result in ascii strings being rendered in IL, though the data variable types are not automatically set to const char[] as they would if the segment was loaded read-only from the view.

Furthermore, regardless of segment flags, if a section is defined such that is contains the ascii strings being referenced/rendered in the IL which has ReadWriteDataSectionSemantics, the rendering of the strings in IL is removed and the variables are instead displayed as &data_12345. This is true both when defining sections from a view plugin, as well as manually from the UI console. Also, this behavior holds true even when the ascii strings in question have been manually defined as char/const char */[].

Steps To Reproduce: Reproduction of this issue is a little unwieldy, as it requires fiddling with segments/sections before/after the associated view plugin has complete initialization. The most easily reproducible PoC will be to use two plugins I've developed for the bFLT file format and Blackfin architecture, which can be found here (bFLT) and here (blackfin, and opening the binary file seen in these screenshots, which I can provide on request (github won't allow me to attach it to this issue)

  1. Open the binary which has the .data section and associated segment loaded read-write
  2. Navigate to the main function (0x10000134) and observe that calls to puts (0x10000170) take char * as arguments, but they are not rendered
  3. Manually set the types of the data variables being passed to puts to const char [], observe no change to IL even after reanalysis
  4. Remove the .data section with bv.remove_auto_section(".data")
  5. redefine the .data section with bv.add_user_section(".data", 0x100011a0, 0x1e0, SectionSemantics.ReadOnlyDataSectionSemantics)
  6. Observe that strings are now rendered properly, but control flow analysis has become incorrect due to interpreting data in that section as read-only when it is not.

Alternatively, for a generic binary/view:

  1. Open a binary in mapped view, and define segments and sections such that ascii strings being referenced by code (such as a call to puts) are within a read/write data section.
  2. Observe that no string rendering takes place.

Expected Behavior: The same behavior which applies to read-only segments and sections for the detection and rendering of ascii strings in IL should apply to read/write segments and sections as well. Alternatively, if there is some reason this is not desired by the BN team, it should be possible to specify something like SectionSemantics.MixedDataSectionSemantics when defining a section to indicate to Binary Ninja that the section is writable and thus control flow decisions should not be made based on the contents of that section, but may still contain constant values for the sake of string rendering and similar functionality.

A scenario in which an ascii string is passed to a call to puts where that string is actually writable and cannot be determined at analysis time is obviously a possibility (though the practice of setting a char buffer to some valid string before changing it is uncommon), and in that case the rendering of the string in the function call would be incorrect. So perhaps this behavior as the default for ReadWriteSectionSemantics isn't the way to go, but there are plenty of file formats which do require this behavior and it should be possible to specify.

Screenshots: Main function where a global in .data is compared to a constant, and one of two strings (also in .data) are printed depending on the result 20220215_11h36m48s_grim Setting the .data section as ReadOnly does fix the string rendering, but now control flow is broken since the Binary Ninja interprets the value to compare as constant 20220215_11h36m15s_grim

Additional Information: Possibly related to https://github.com/Vector35/binaryninja-api/issues/2969

0bs3n avatar Feb 15 '22 21:02 0bs3n

This should be resolved in builds >= 3.1.3617

0cyn avatar Aug 10 '22 16:08 0cyn

Could you explain the resolution so I can test? I've just updated to 3.1.3617-dev and I don't see a change in behavior. Thanks!

0bs3n avatar Aug 10 '22 17:08 0bs3n

Interesting, we now convert constants to constant pointers in MLIL if they point to any allocated address, which should be inlining it here; could you send the binary? (you should be able to upload it here if it's wrapped in a .zip file; if not, something like https://bashupload.com/ also works fine)

0cyn avatar Aug 10 '22 17:08 0cyn

Sure, I've attached a zip including the binary in question and also the required plugins for analyzing the file in BN. libarch-blackfin.so and binaryninja-bflt are the plugins, and lighthouse the binary. I'm assuming here that the architecture shouldn't matter, but let me know if this isn't a suitable PoC.

In the binary, 0x10000fdc is main(), and the calls to sub_10000ce4 you can find there all take a char * as their first argument and are a good example of the behavior.

(If you'd rather not load an .so from a stranger you can pull and build the blackfin plugin from source here)

string_render_poc.zip

0bs3n avatar Aug 10 '22 17:08 0bs3n

Alright, this should be actually fixed in builds >= 3.1.3624 😊

0cyn avatar Aug 12 '22 18:08 0cyn

Thanks! I just want to note that after this change (which did effectively force the rendering of probable strings for read/write sections/segments) made it so that the bug described in https://github.com/Vector35/binaryninja-api/issues/2969 (which originally only affected read-only sections/segments) now also affects read-write sections/segments, breaking rendering of structures or other types which happen to have a string at offset zero.

I've attached a bndb of the same "lighthouse" binary supplied previously which has structures defined to demonstrate the issue, as well as a plugin built for the most recent API version. bndb_and_plugin.zip

Here are some images as well:

Before the fix, see address 0x10000ff8 20220815_10h06m38s_grim

After the fix, see address 0x10000ff8 20220815_10h07m58s_grim

The fix here did address this issue so I think it should remain closed, and thanks very much for taking care of it!

0bs3n avatar Aug 15 '22 17:08 0bs3n