ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

HighCodeSymbol DataType is undefined for typed global

Open mumbel opened this issue 1 year ago • 4 comments

Describe the bug not sure if I'm using the API incorrectly, but I'm getting undefined datatype for a global, and even in the UI this can be seen with cltl+l on foo_thing returning undefined. (I'm not certain if these are the same issue or not, see additional context).

decodeMapSym() doesn't seem to be able to create a HighCodeSymbol to decode with as it only has a HighFunction. HighFunction cannot decode() into a HighCodeSymbol correctly as it has no datatype.

If I have the following C. I am iterating callers to setup_this_thing. If I try to decompile init_foo_thing, I am having issues getting information about foo_thing?

typedef struct _thing {
    bool_t toggle;
    char *name;
} thing_t;

thing_t foo_thing[4] = {
    { 0, "fooA"},
    { 0, "fooB"},
    { 0, "fooC"},
    {0, NULL}
};

extern void setup_this_thing(thing_t *foo_thing, bool_t toggle);

void init_foo_thing(bool_t toggle)
{
    setup_this_thing(foo_thing, toggle);
    return;
}
    dr = IFC.decompileFunction(func, 200, monitor)                                                                                                                                    
    hf = dr.getHighFunction()                                                                        
    gsm = hf.getGlobalSymbolMap()                                                                    
    for sym in gsm.getSymbols():                                                                     
        print "\tSYM: %r %r %r" % (sym, sym.name, sym.getDataType())                                 
        return

Expected behavior getDataType() should return a type for a typed symbol

Environment (please complete the following information):

  • OS: 20.04
  • Java Version: 17.0.4
  • Ghidra Version: 6fad151b5440de3dfdd619df7b5dc070b2b4dc6c
  • Ghidra Origin: eclipse

Additional context This does not solve the ctlr+l case, but at least in the script. You could lookup Data at addr and if it exists and has a DataType, use instead of undefined and 1

https://github.com/NationalSecurityAgency/ghidra/blob/03b42fc6e4efae32210112e0f6a73fc11cf3c413/Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/pcode/HighConstant.java#L114

mumbel avatar Aug 06 '22 20:08 mumbel

I think I know what's going on. Basically, you have a function foo whose signature has been committed to the program database. At one of the call sites of foo, a global variable g is passed as an argument to foo, so Ghidra should know and apply the appropriate datatype to g.

At the moment, Ghidra does not automatically apply types to global variables from function signatures. There is some discussion in #4281 - this issue also explains why in some cases format strings passed to printf are not defined as strings in Ghidra. We're considering how to address this.

Please let me know if I've misunderstood the issue.

ghidracadabra avatar Aug 09 '22 13:08 ghidracadabra

just to clarify, the global is typed already at this point and there is a committed function signature, so just surprising that reading the symbol using my script above or the ctlr+l "Retype Variable" is returning undefined.

Ghidra does type some things from signatures (or is this not really the same). Like allocator functions I've always had pretty decent luck with working. newthing = create_new_thing(4), where newthing is a global, doing ctrl+l there will pick up the DataType thing_t *where it's currently untyped/undefined4 (so not automatic, but does let me easily type it).

I did do a little more debugging. DecompileCallback::encodeData() does have the right info. I added prints to all the callers that'd make HighSymbol like things and I ended up seeing this come through, but then on my sym.getDataType() it was another path, and only had the HighFunction information leading to that HighConstant w/o type info.

(forgot to check the ctrl+l logic with this setup, still not certain what code path that is)

mumbel avatar Aug 09 '22 14:08 mumbel

Ok, thanks for the additional info. I will take a closer look.

ghidracadabra avatar Aug 09 '22 14:08 ghidracadabra

I've been working in a larger binary, and the example above is just the stripped down setup. I can try to get a byte string/ELF together that shows this behavior if repro is an issue.

mumbel avatar Aug 09 '22 15:08 mumbel

That would be helpful; I'm having difficulty reproducing this. If I compile your example with -g all the data types get created and applied and every seems to be working.

As an experiment, what happens if you manually change the type of foo_thing to int or string?

ghidracadabra avatar Aug 10 '22 14:08 ghidracadabra

I'll play around with getting a repro sample tonight.

In my big binary, if I currently have both signature and global typed correctly, I see setup_this_thing(foo_thing, toggle) in the decompiler and foo_thing is undefined with ctrl+l/script.

If I change foo_thing to say db[36], the only noticeable change is setup_this_thing((thing_t *)foo_thing, toggle), with ctrl+l and getDataType() returning undefined

if i take thing_t completely out and just use like int * for everything (signature/foo_thing), still undefined for both.

mumbel avatar Aug 10 '22 15:08 mumbel

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 94 21 ff f8 7c 64 1b 78 3c 60 04 00 7c 08 02 a6 38 63 00 70 90 01 00 0c 42 80 00 15 80 01 00 0c 38 21 00 08 7c 08 03 a6 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 38 21 00 08 7c 08 03 a6 4e 80 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 90 00 00 00 00 04 00 00 98 00 00 00 00 04 00 00 a0 00 00 00 00 00 00 00 00 68 65 6c 6c 6f 00 00 00 77 6f 72 6c 64 00 00 00 66 6f 6f 20 62 61 72 00 00 00 00 00 00 00 00 00

loading at 0x04000000 PowerPC:BE:32:QUICC

init_foo_thing at 04000010 setup_this_thing at 0400003c foo_thing[4] at 04000070

mumbel avatar Aug 11 '22 02:08 mumbel

Thanks for the example; I'm seeing some weirdness there.

In your example, does the global have a name and the correct type or just the correct type?

ghidracadabra avatar Aug 11 '22 15:08 ghidracadabra

No I don't think I've named them anywhere in my larger image. its always just been thing_t_ARRAY_ADDRADDR

And... yeah, naming it fixes it.

edit: I guess that's why I've had mixed results with other globals (setting globals from allocators) since I do name a lot of the cases.

mumbel avatar Aug 11 '22 17:08 mumbel