ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

(Proof of concept) Width 4 boolean

Open sad-dev opened this issue 1 year ago • 2 comments

It is not uncommon for a program to clear the high bits of registers for booleans. For example, the following are (separate instructions) from a MIPS (32 bit, big endian) binary which precede a return, illustrating different ways the function returns true and false

image

image

Callers can access the full register and treat it as a boolean value, which the decompiler does not know about. image This leads to decompilation like: image

This PR introduces a bool4 (Bool4DataType extends BooleanDataType) which tries to behave as much like a width4 boolean as possible.

After: (changing return type to bool4) image

It doesn't decompile better in the base function and might not completely capture the meaning of a "width 4 boolean" , especially where the decompiler is concerned (for example, the if (bVar3 != false) {...} line); This PR is intended to be a proof of concept rather than used as is image

sad-dev avatar Dec 28 '23 04:12 sad-dev

I dabbled with modifying the decompiler (Varnode::isBooleanValue, ActionFuncLink::funcLinkOutput and FuncCallSpecs::commitNewOutputs) but with little success. For example, I removed the following check in ActionFuncLink::funcLinkOutput

    ProtoParameter *outparam = fc->getOutput();
    Datatype *outtype = outparam->getType();
    if (outtype->getMetatype() != TYPE_VOID) {
      int4 sz = outparam->getSize();
      if (sz == 1 && outtype->getMetatype() == TYPE_BOOL && data.isTypeRecoveryOn())
	data.opMarkCalculatedBool(callop);
    ProtoParameter *outparam = fc->getOutput();
    Datatype *outtype = outparam->getType();
    if (outtype->getMetatype() != TYPE_VOID) {
      int4 sz = outparam->getSize();
      if (outtype->getMetatype() == TYPE_BOOL && data.isTypeRecoveryOn())
	data.opMarkCalculatedBool(callop);

but obtained results like:

bool4 bVar3;
...
bVar3 = bar(...);
...
if ((BADTYPE) bVar3)) {...}

sad-dev avatar Dec 28 '23 06:12 sad-dev

Try adding <aggressivetrim signext="true"/> to the mips32be.cspec (as seen in mips64_32_n32.cspec) and then adding extension="sign"/> to the v0 output pentry. I think this will work assuming a bool is still treated as a number. This problem exists in almost every compiler spec (specifically bools) I've seen. It's either that compilers are inconsistent about it or it's the old stupid #define TRUE 1 #define FALSE 0

astrelsky avatar Dec 29 '23 13:12 astrelsky

This should not be done by adding a larger datatype. It is not uncommon for smaller types (e.g., short) to move in/out of larger registers. We will look into the decompiler's handling of this situation, but we cannot accept this PR. The concern should be entered as an Issue and not a PR. The datatype size is a function of the ABI/Compiler and not the register size.

ghidra1 avatar Jan 02 '24 15:01 ghidra1

OK, noted. I will give the aggressivetrim suggestion a go too. I have some ideas for a different datatype, specifically, an int4 that displays like a string (useful for functions like the Windows ExAllocatePoolWithTag API) but i'll leave that to a different PR, if that sounds viable

sad-dev avatar Jan 03 '24 02:01 sad-dev

We do have instance settings that can be applied which will influence rendering within the listing - although decompiler will not respect. There are also a selective convert actions within the listing and decompiler that than can be used to render a numeric constant in different formats (although there appears to be a disrepency in how the char sequence case is handled).

I would try to avoid adding a datatype ince you are likely to run into issues with the decompiler. Trying to have it behave as an integer and render as a string. In the end all primitive types are conveyed to the decompiler as a core metatype.

ghidra1 avatar Jan 03 '24 17:01 ghidra1