pharos
pharos copied to clipboard
Identical function names for different code sections
I have the situation that two different addresses have been given the same function names after applying ooanlyzer in Ghidra.
For example, at address 0x00437c66 the decompilation gives a function of
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
and at address 0x004222e0
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
with different code contents.
Maybe these are virtual methods from classes that inherit from a top level class? But should they then not get their own class name? address 0x8b02e8 is just data according to ghidra:
DAT_008b02e8 XREF[1]: cls_0x8b02e8:004fcc32(*)
008b02e8 43 ?? 43h C ? -> 006c5143
008b02e9 51 ?? 51h Q
008b02ea 6c ?? 6Ch l
008b02eb 00 ?? 00h
but there is a reference to 0x004fcc32, which is another instance of the same function:
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
with yet other code contents.
My c++-fu may be weak. I'll gladly accept some education.
Thanks.
I'm adding some screen shots in the hope that they add clarity
I think there are two concerns here.
- All constructors on a class are named the same. Is this a bug? What does Ghidra do for overloaded functions?
- Are these methods really on the same class? If the decompiler is right about the parameters, I don't think they can be. This could possibly be a new sanity rule: you can only have one constructor on a class for a given set of parameters.
Can you share the exe and ooanalyzer outputs, either publicly or privately?
Sent from Workspace ONE Boxer
On Jul 18, 2020 8:07 PM, Jan Beck [email protected] wrote:
I have the situation that two different addresses have been given the same function names after applying ooanlyzer in Ghidra.
For example, at address 0x00437c66 the decompilation gives a function of
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
and at address 0x004222e0
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
with different code contents.
Maybe these are virtual methods from classes that inherit from a top level class? But should they then not get their own class name? address 0x8b02e8 is just data according to ghidra:
DAT_008b02e8 XREF[1]: cls_0x8b02e8:004fcc32(*)
008b02e8 43 ?? 43h C ? -> 006c5143
008b02e9 51 ?? 51h Q
008b02ea 6c ?? 6Ch l
008b02eb 00 ?? 00h
but there is a reference to 0x004fcc32, which is another instance of the same function:
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
with yet other code contents.
My c++-fu may be weak. I'll gladly accept some education.
Thanks.
I'm adding some screen shots in the hope that they add clarity
[Screenshot from 2020-07-18 19-46-29]https://user-images.githubusercontent.com/11450961/87864092-cc83ed80-c931-11ea-95d6-a39460deaec9.png
[Screenshot from 2020-07-18 20-04-56]https://user-images.githubusercontent.com/11450961/87864103-0d7c0200-c932-11ea-89cc-863297a2294c.png
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cmu-sei/pharos/issues/123, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL6ZAVGXU2ZXICLXHEPLZ53R4I2LNANCNFSM4PARQ7AQ.
Same exe I have been using all along to test ooanalyzer. Glad to share the files ooprog-results_V9.pl.zip V9.ooanalyzer.json.zip V9.exe.zip
The ooprog.log file is very large, even compressed, and github wont let me upload it. But if you feel it's important I can put it up somewhere else.
Thanks
Thanks. I'll let you know if we need the prolog log, but probably not.
Sent from Workspace ONE Boxer
On Jul 19, 2020 11:11 AM, Jan Beck [email protected] wrote:
Same exe I have been using all along to test ooanalyzer. Glad to share the files ooprog-results_V9.pl.ziphttps://github.com/cmu-sei/pharos/files/4943873/ooprog-results_V9.pl.zip V9.ooanalyzer.json.ziphttps://github.com/cmu-sei/pharos/files/4943875/V9.ooanalyzer.json.zip V9.exe.ziphttps://github.com/cmu-sei/pharos/files/4943876/V9.exe.zip
The ooprog.log file is very large, even compressed, and github wont let me upload it. But if you feel it's important I can put it up somewhere else.
Thanks
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/cmu-sei/pharos/issues/123#issuecomment-660660527, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL6ZAVHHHVDFYWZKCSK4WGLR4MELPANCNFSM4PARQ7AQ.
I am looking at this some more. According to OOAnalyzer, 0x8b02e8 is a VFTable, which is then used as a class identifier.
@janbbeck Just to clarify, is your complaint that there are multiple functions in Ghidra with the same name? If there are multiple constructors identified on the same class, how would you like them to be named?
There are in fact many more than three constructors identified for class 0x8b02e8. I don't see how this could be right. I will let @sei-ccohen weigh in. But my gut feeling is that there are a lot of derived classes that were incorrectly merged together.
There are some other odd things as well. OOAnalyzer did not seem to set 0x8b02e8 and other vftables as vftables in Ghidra (though OOAnalyzer itself did detect them).
Here is why the VFTable is not set correctly:
2020-06-11 10:44:55 WARN (OOAnalyzer) Could not create virtual function table at 0x008b02e8: ghidra.program.model.util.CodeUnitInsertionException: Conflicting data exists at address 008b02ec to 008b02ef
Ghidra already created a pointer there. We should probably delete it.
Ah, we remove any data at the start of the vftable. But don't remove data "inside" the vftable.
I am looking at this some more. According to OOAnalyzer, 0x8b02e8 is a VFTable, which is then used as a class identifier.
@janbbeck Just to clarify, is your complaint that there are multiple functions in Ghidra with the same name? If there are multiple constructors identified on the same class, how would you like them to be named?
There are in fact many more than three constructors identified for class 0x8b02e8. I don't see how this could be right. I will let @sei-ccohen weigh in. But my gut feeling is that there are a lot of derived classes that were incorrectly merged together.
There are some other odd things as well. OOAnalyzer did not seem to set 0x8b02e8 and other vftables as vftables in Ghidra (though OOAnalyzer itself did detect them).
My concern is less that there are identical function names, but more that there are identical constructors (identical function names + arguments) for a single class
Sorry, I should have weighed in (publicly) on this issue earlier. My theory is that it's entirely possible that the parameters are not in fact identical. The decompiler could for example easily confuse these cases:
Class::Class(OtherClassA x), Class::Class(OtherClassB x), Class::Class(OtherClassC x)
and so on... It's even possible that there's a very large number of such constructors because there was a template such as:
template <T> Class::Class(T x) { ... }
I'm not saying that's what happened here. Just that I'd want to see more evidence of a problem before assuming that having lots of constructors implies a logic problem in OOAnalyzer. I also haven't looked in much detail at the details you have provided since sei-eschwartz was handling most of the issue. I see now that you did upload results and exes, so I'll look more at the actual facts tomorrow. ;-)
Sorry, I should have weighed in (publicly) on this issue earlier. My theory is that it's entirely possible that the parameters are not in fact identical. The decompiler could for example easily confuse these cases:
Class::Class(OtherClassA x), Class::Class(OtherClassB x), Class::Class(OtherClassC x)
and so on... It's even possible that there's a very large number of such constructors because there was a template such as:
template Class::Class(T x) { ... }
I'm not saying that's what happened here. Just that I'd want to see more evidence of a problem before assuming that having lots of constructors implies a logic problem in OOAnalyzer. I also haven't looked in much detail at the details you have provided since sei-eschwartz was handling most of the issue. I see now that you did upload results and exes, so I'll look more at the actual facts tomorrow. ;-)
Your examples
Class::Class(OtherClassA x), Class::Class(OtherClassB x), Class::Class(OtherClassC x)
are all valid, of course. But what I see in the namespace now is several instances of (equivalently)
Class::Class(OtherClassA x), Class::Class(OtherClassA x), Class::Class(OtherClassA x), i.e.
several functions with different code contents having this exact name:
cls_0x8b02e8 * __thiscall OOAnalyzer::cls_0x8b02e8::cls_0x8b02e8(cls_0x8b02e8 *this)
Can this be intended?
Just to be complete, I can think of at least four ways we could legitimately end up with two things that look like constructors for the same object with identical parameters:
- Constructor A takes an object pointer, O, and constructor B takes an object pointer, P, where P derives from O. If only O behavior is used on P by B, then we can't necessarily tell the difference in the binary.
- Constructors for objects A and B, where B inherits from A, they have no virtual methods, and B's constructor does nothing that could not be done in an A.
- Two constructors on an object where at least one of them takes a second "tag" parameter, where a tag is an empty class. (Not uncommon in C++; see tag dispatch.) The second parameter disappears in the compilation since it's just used to choose between otherwise identical constructors.
- A function or static method that calls a constructor for an object and returns a pointer to it. If the constructor is inlined, there may be no way to distinguish this function or method from a constructor.
So... I've actually looked at this a little now (instead of just reading the commentary). I think it's pretty clear that 0x8b02e8 is in fact a virtual function table. There are several references to external CWnd::xxx methods in MFC42.DLL In there. So that's probably a user-derived class with CWnd as a base. I think the legitimate constructor is at 0x4fcc00, which properly installs that VFTable into the object.
But it's unclear to me right now how these other methods got associated with that class. They do appear to be constructors since they appear to be installing different VFTables into their respective objects. They also appear to be derivatives of CWnd, and possibly 0x8b02e8, which would explain why got confused and merged them all together. That's probably why that class is gigantic. :-( What definitely signals that something is wrong here (and we should try to write a new rule for this) is that these methods can't be constructors on a class that has a VFTable (0x8b02e8) with actually installing that VFTable into the object. Installing a different VFTable (as these methods do) necessitates that they are constructors on a different class. I'll see if I can write a new rule and test it, but V9 is obviously a big file so it'll take a while.
Hmm. We've already got this rule. :-(
https://github.com/cmu-sei/pharos/blob/master/share/prolog/oorules/rules.pl#L2466
So the question really is: Why didn't it trigger properly, and prevent these classes from being merged? For example,
Instruction 0x422312 installs 0x8a22f0 in method 0x4222e0 Instruction 0x437c98 installs 0x8a3dc8 in method 0x437c66 Instruction 0x4fcc32 installs 0x8b02e8 in method 0x4fcc00
And while I can't easily prove that we knew each of those were VFTableWrites without the full log, I can prove that we knew that they were VFtables by the end of the analysis:
$ grep finalVFTable ooprog-results_V9.pl | grep -e 0x8a22f0 -e 0x8a3dc8 -e 0x8b02e8 finalVFTable(0x8a22f0, 0xf0, 0xf0, 0, ''). finalVFTable(0x8a3dc8, 0xf4, 0xf4, 0, ''). finalVFTable(0x8b02e8, 0xe8, 0xe8, 0, '').
And yet they all ended up on the same class (0x8b02e8).
So that rule should have blocked the merger... Something more complicated must be going on. Perhaps it's related to ordering and the triggers? Do we need a sanity check rule to prevent this condition post merger? How did we fail to detect the condition before the merger?
More curious results... @sei-eschwartz ran this file to completion on Jul 3 & 4, and that run did NOT merge these methods, instead assigning them to separate classes as OOAnalyzer should have. Your ticket is more recent than that though. What day did you run the OOAnalyzer on, and were you running the latest version OOAnalyzer at that time to your knowledge?
@cfcohen It's very likely the version of OOAnalyzer I used on July 4th was not publicly released at that time.
Anyway, it seems very likely that we have fixed the problem in one of our recent releases. I am attaching my V9.exe.results
file. I don't have the json file handy because we're about to change the format, but you can just rerun oojson
.
I am going to close this issue for now under the hope that the new file fixes all your concerns. Please re-open if it turns out there is still something that doesn't look right.
Also, we just pushed a new release with the new json format. Here's the json in that new format. V9.exe.json.zip
Thank you for that json file - and all the comments in this thread. This looks much more sane to me.
Upon looking at the namespace further, I still have some questions. There are similarly many functions in the namespace for CWnd. for example, at
0x00444ab1 there is CWnd * __thiscall CWnd(CWnd *this,int param_1) and at 0x004cd518 there is CWnd * __thiscall CWnd(CWnd *this,int param_1)
Again I see two functions with the same name and parameters. Also, the constructors for CWnd are certainly not inside the program code. And if any class inherits from CWnd, then it should have a different name shouldn't it?
Or am I misunderstanding something?
We will take a look
I think you're right @janbbeck. CWnd::CWnd
does not look to me like it is in the binary. Instead, we seem to have incorrectly merged a method from CWnd
into the class.
Possible solution:
- If we have the name from import symbols for one method, all methods on the class should have imported names? Could be a not NOTMergeClasses rule or sanity checking rule.
What do you think @sei-ccohen?
I'm pretty sure that the CWnd methods are correctly merged into the class. My theory here is that I think we have a heuristic to name a class based on the class names of any methods that it calls out to. Normally, a class with a method named "Cls::m()" would be named "Cls". Normally, this does well at naming classes that are completely (or nearly completely) external.
In this particular case that results in a bunch of classes all named "CWnd", which is confusing. I think we should probably name them "CWnd1", "Cwnd2", "CWnd3", etc. @sei-eschwartz and I will discuss.
So we haven't forgot about this, but haven't been updating the github issue very well.
So our conclusion is that we are incorrectly merging the methods into the class because the class inherits from an imported class. This is important, because we don't see some things in an imported class (like the constructor installing a vftable), which changes the results that fire. So we modified a rule to help us detect inheritance in that situation.
We also added the rule that I suggested above, which is to not allow classes with a mix of imported and non-imported methods.
I just got the updated results.
V9.exe.json.zip
I haven't been able to investigate much at all yet. Cwnd no longer appears in the class list under OOAnalyzer
. I'm not 100% clear on what that means yet.
Disregard for now. Something has clearly gone wrong with the generation of my facts file.
I forgot to include mfc42.json
:-(
Thanks for the update! Good to know I wasn't crazy :P
Just so know this has been running for several days now. I can't remember if it took this long last time or not.
including the defs for mfc did drastically increase run time for me as well.
Thanks!
Sadly it doesn't look like we figured out the inheritance correctly for 0x444ab1. I'll look at that more.