ghidra
ghidra copied to clipboard
HighCodeSymbol DataType is undefined for typed global
Describe the bug
not sure if I'm using the API incorrectly, but I'm getting undefined
datatype for a global, and even in the UI this can be seen with cltl+l
on foo_thing
returning undefined
. (I'm not certain if these are the same issue or not, see additional context).
decodeMapSym() doesn't seem to be able to create a HighCodeSymbol to decode with as it only has a HighFunction. HighFunction cannot decode() into a HighCodeSymbol correctly as it has no datatype.
If I have the following C. I am iterating callers to setup_this_thing
. If I try to decompile init_foo_thing
, I am having issues getting information about foo_thing
?
typedef struct _thing {
bool_t toggle;
char *name;
} thing_t;
thing_t foo_thing[4] = {
{ 0, "fooA"},
{ 0, "fooB"},
{ 0, "fooC"},
{0, NULL}
};
extern void setup_this_thing(thing_t *foo_thing, bool_t toggle);
void init_foo_thing(bool_t toggle)
{
setup_this_thing(foo_thing, toggle);
return;
}
dr = IFC.decompileFunction(func, 200, monitor)
hf = dr.getHighFunction()
gsm = hf.getGlobalSymbolMap()
for sym in gsm.getSymbols():
print "\tSYM: %r %r %r" % (sym, sym.name, sym.getDataType())
return
Expected behavior getDataType() should return a type for a typed symbol
Environment (please complete the following information):
- OS: 20.04
- Java Version: 17.0.4
- Ghidra Version: 6fad151b5440de3dfdd619df7b5dc070b2b4dc6c
- Ghidra Origin: eclipse
Additional context
This does not solve the ctlr+l
case, but at least in the script. You could lookup Data at addr and if it exists and has a DataType, use instead of undefined
and 1
https://github.com/NationalSecurityAgency/ghidra/blob/03b42fc6e4efae32210112e0f6a73fc11cf3c413/Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/pcode/HighConstant.java#L114
I think I know what's going on. Basically, you have a function foo
whose signature has been committed to the program database. At one of the call sites of foo
, a global variable g
is passed as an argument to foo
, so Ghidra should know and apply the appropriate datatype to g
.
At the moment, Ghidra does not automatically apply types to global variables from function signatures. There is some discussion in #4281 - this issue also explains why in some cases format strings passed to printf
are not defined as strings in Ghidra. We're considering how to address this.
Please let me know if I've misunderstood the issue.
just to clarify, the global is typed already at this point and there is a committed function signature, so just surprising that reading the symbol using my script above or the ctlr+l
"Retype Variable" is returning undefined
.
Ghidra does type some things from signatures (or is this not really the same). Like allocator functions I've always had pretty decent luck with working. newthing = create_new_thing(4)
, where newthing is a global, doing ctrl+l
there will pick up the DataType thing_t *
where it's currently untyped/undefined4
(so not automatic, but does let me easily type it).
I did do a little more debugging. DecompileCallback::encodeData() does have the right info. I added prints to all the callers that'd make HighSymbol like things and I ended up seeing this come through, but then on my sym.getDataType()
it was another path, and only had the HighFunction information leading to that HighConstant w/o type info.
(forgot to check the ctrl+l
logic with this setup, still not certain what code path that is)
Ok, thanks for the additional info. I will take a closer look.
I've been working in a larger binary, and the example above is just the stripped down setup. I can try to get a byte string/ELF together that shows this behavior if repro is an issue.
That would be helpful; I'm having difficulty reproducing this. If I compile your example with -g
all the data types get created and applied and every seems to be working.
As an experiment, what happens if you manually change the type of foo_thing
to int
or string
?
I'll play around with getting a repro sample tonight.
In my big binary, if I currently have both signature and global typed correctly, I see setup_this_thing(foo_thing, toggle)
in the decompiler and foo_thing
is undefined
with ctrl+l
/script.
If I change foo_thing
to say db[36]
, the only noticeable change is setup_this_thing((thing_t *)foo_thing, toggle)
, with ctrl+l
and getDataType()
returning undefined
if i take thing_t
completely out and just use like int *
for everything (signature/foo_thing), still undefined for both.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 94 21 ff f8 7c 64 1b 78 3c 60 04 00 7c 08 02 a6 38 63 00 70 90 01 00 0c 42 80 00 15 80 01 00 0c 38 21 00 08 7c 08 03 a6 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 38 21 00 08 7c 08 03 a6 4e 80 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 90 00 00 00 00 04 00 00 98 00 00 00 00 04 00 00 a0 00 00 00 00 00 00 00 00 68 65 6c 6c 6f 00 00 00 77 6f 72 6c 64 00 00 00 66 6f 6f 20 62 61 72 00 00 00 00 00 00 00 00 00
loading at 0x04000000 PowerPC:BE:32:QUICC
init_foo_thing at 04000010 setup_this_thing at 0400003c foo_thing[4] at 04000070
Thanks for the example; I'm seeing some weirdness there.
In your example, does the global have a name and the correct type or just the correct type?
No I don't think I've named them anywhere in my larger image. its always just been thing_t_ARRAY_ADDRADDR
And... yeah, naming it fixes it.
edit: I guess that's why I've had mixed results with other globals (setting globals from allocators) since I do name a lot of the cases.