ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Bitfield support in decompiler

Open nik0sc opened this issue 6 years ago • 22 comments
trafficstars

Is your feature request related to a problem? Please describe. Right now the decompiler shows bitfield access simply as shift and mask (in other words, it is unaware of bitfields).

For example, consider:

  • a big-endian bitfield that is a byte long, and
  • a member 3 bits long starting at the 2nd bit.

A member read might look like bitfield >> 3 & 0x7, and a member write like bitfield = (bitfield & 0xc7) | (member << 3 & 0x38). This makes understanding decompiler output difficult.

The data type manager allows the declaration of bitfields only by importing them through the "Parse C Source" menu item (great if you have a header file for your platform), however the decompiler does not make use of this information.

Describe the solution you'd like

  • Ability to declare bitfields in the data type manager
  • Control over implementation-specific details like member allocation order
  • Decompiler recognizes data/variables typed as a bitfield + shift-and-mask pcode matching defined offsets and lengths as a bitfield member access, and shows the member access instead of the shift and mask

The above example would then look like var1 = bitfield.member and bitfield.member = var1 for the read and write cases.

Describe alternatives you've considered No real alternative besides the current situation of consulting datasheets and my own notes for bitfield layout.

  • Bitfield layout will depend on architecture and endianness
  • There is no definitive way for a function to access a bitfield member. It could shift first then mask, or mask then shift. Recognizing member access, even by pcode, might not be trivial.

Additional context This is mainly for embedded systems that pack many short parameters into registers.

nik0sc avatar Jun 01 '19 14:06 nik0sc

This is something I'd really like to see implemented, both in the decompiler and just in the disassembly list view. I feel like a lot of good additions could be done to the enumerations feature. In addition to this, the ability to specify values within bitmasks within the enum would be great. Systems that use their own flag registers may group multiple independent sets into a single register each with a different mask.

Separating enumerations from the overall "data types" in some way would make navigating them easier as well.

dkatzdev avatar Jun 06 '19 17:06 dkatzdev

Are you by any chance trying to decompile mips binaries? In recent ISAs (r2 and above) there are specific instructions for accessing fields which could be decompiled if you have the type straight as a C bitfield operation.

nihilus avatar Jun 10 '19 21:06 nihilus

@nihilus Don't know about mips but I'm working on a powerpc binary right now. Most bitfield access is done with the rlwinm and rlwimi instructions which make it very clear which range of a register is being read and written. But of course this doesn't translate into decompiled output.

nik0sc avatar Jun 11 '19 09:06 nik0sc

on x86 its a mess of shifting and masking

MrSapps avatar Jun 11 '19 11:06 MrSapps

I'm on ARM currently, and there are a ton of processor specific SFR's as well as flags within the user firmware that would drastically benefit from this

dkatzdev avatar Jun 12 '19 14:06 dkatzdev

The ability to represent bitfields within Structures has just been added to the master branch . Support for bitfields has been added to the CParser, PDB parser and DWARF. The PDB XML file format has changed for bitfields - any retained PDB XML files will need to be regenerated to benefit from the bitfield improvements (bitfield bit-offset information was missing from XML). Note that "aligned" bitfield packing support is currently to msb filled first for big-endian and lsb filled-first for little-endian data. These bitfield component definitions are currently not conveyed to the decompiler and there is currently no bitfield reference mechanism. Structure Data instances in memory will reflect bitfield data. See Structure Editor help content for some additional information.

ghidra1 avatar Jul 18 '19 22:07 ghidra1

I am closing this ticket since no immediate action is required. We are investigating bitfield support for the decompiler.

ghidra1 avatar Jul 22 '19 23:07 ghidra1

@ghidra1 What's the prognosis here??? We can currently define bitfields, but the decompiler support is still missing!!

Wall-AF avatar Aug 24 '22 13:08 Wall-AF

This is a feature request that has neither been implemented or rejected for future support. I will reopen it and put it through our triage and prioritization process.

ryanmkurtz avatar Aug 24 '22 14:08 ryanmkurtz

Support for bitfields in the decompiler is planned, but we have no timeline yet.

caheckman avatar Aug 24 '22 16:08 caheckman

Don't forget, a bitfield may span more the one register. Eg, in early x86 assembly a long (being 32-bits) has to use 2 16-bit registers! This is currently an issue as we end up treating the result as 2 16-bit values at present.

Wall-AF avatar Aug 25 '22 17:08 Wall-AF

@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.

ghidra1 avatar Aug 25 '22 17:08 ghidra1

@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.

Understood. Just being hopeful! Maybe there could be a manual way to tell the decompiler to treat 2 registers as one longer register (at some future point).

Wall-AF avatar Aug 25 '22 17:08 Wall-AF

treat 2 registers as one longer register

This is a double-edge sword and is done only with adjacent registers in the language implementation. Doing this can encourage decompiler to always treat as single varnode even for cases where they should be separate.

ghidra1 avatar Aug 26 '22 15:08 ghidra1

treat 2 registers as one longer register

Only by manual say so.

Wall-AF avatar Aug 26 '22 17:08 Wall-AF

treat 2 registers as one longer register

The reason behind this is twofold:

  1. In 16-bit processors/compilers, 32-bit numeric values are (90+% of the time in my app) manipulated through two 16-bit registers using different register combinations (sometimes DX:AX or AX:DX or other combinations that may include BX and CX). (I'm sure this will be similar for 64-bit processors needing to represent 128- or 256-bit numbers.) In these cases, providing the stack variable or (pointed at) structure has the location/member defined as a 32-bit type, the register load occurs using the correct endianness of the single location using a +2 on the named variable/member for the high-word (in little-endian). This should enable the decompiler to understand the concept I believe.
  2. There already exists similar functionality for defining custom calling conventions as demonstrated in the x86-16.cspec file that ensure 32-bit returns populate the DX:AX register combination.

Wall-AF avatar Aug 26 '22 18:08 Wall-AF

Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?

redfast00 avatar Sep 05 '23 15:09 redfast00

I hope this is something that still being worked on

theclub654 avatar Feb 12 '24 14:02 theclub654

What parts of the decompiler would need to be modified and/or what work would need to be done to support this?

Varriount avatar Apr 21 '24 17:04 Varriount

Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?

Something you can do is create an enum datatype for each bitfield value and then continuously add bitfield permutations to the enum as you come across them in the decompilation.

For example, you could start out with the following enum:

1 = READ
2 = WRITE
4 = EXECUTE

Then add the following after you come across certain permutations:

3 = READ_and_WRITE
5 = READ_and_EXECUTE
7 = READ_and_WRITE_and_EXECUTE

This is tedius, but it's way better than setting one time equates. At least you'll only have to define each permutation once.

spicydll avatar May 23 '24 17:05 spicydll

@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as READ|WRITE, so you should only need to define the individual flags.

sollyucko avatar May 23 '24 17:05 sollyucko

@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as READ|WRITE, so you should only need to define the individual flags.

This is useful, though it runs into limitations when you pit it again something like masking out bits. So something like foo = foo & ~(SOME|BITS) when end up being a bit & with every bit that's being kept ORed out. If automatic permuting could use ~ appropriately then that'd be a huge usability boon.

Altazimuth avatar Jun 19 '24 06:06 Altazimuth