kaiju
kaiju copied to clipboard
Adopt object-oriented facts with the help of Ghidra (VFTables, VBTables....)
Is your feature request related to a problem? Please describe. Up to now the generation of initial facts in Pharos is done exclusively with the help of the Rose Framework. This works so far also quite well, with a few cutbacks.
Since a short time very strong scripts and classes are available in Ghidra to get further information about inheritance (here in special respect VBTables) with the help of RTTI data. There is also some additional information about the RTTI structures in relation to different inheritance types. Up to now I was not aware of all values myself. The info seems to come from elaborate researches how the data correlate. On the basis of this the VBTables are determined among other things. This is done with the help of the decompiler API which is of course on a much higher level than the pure assembler code.
Describe the solution you'd like Following suggestion: Provide us a possibility that we can create the initial facts with the help of Ghidra. This could simplify the quality significantly. Ghidra has likewise a few functions to discover constructures and deconstructures. Furthermore, functions are available to build an inheritance tree based on the RTTI data.
I could imagine that with this information one could create much more initial facts for Pharos. You might only have to solve minor problems with it which could speed up the overall process significantly.
Here you can find the big class in the source code of Ghidra, which solves the main work using the versatile decompiler API.
Describe alternatives you've considered In my opinion you should put much more effort into using the Ghidra Decompiler API. It could be very helpful for analysis even for normal classes without RTTI data. Perhaps one should invest here some research work.
Additional context See descriptions above, there is nothing important about it.
Ghidra 10 (beta just came out a few days ago!) has a decompiler update with some new type/class recovery capabilities from RTTI built-in. We'll need to take a look at what it does and compare its results with Pharos.
@sei-eschwartz any particular thoughts?
@sei-ccohen and I have talked about using Ghidra to produce initial facts for OOAnalyzer. As you noted, there are capabilities in Ghidra that would be helpful and do not currently have a corresponding implementation in Pharos/ROSE.
I don't currently believe RTTI is one of those areas, though. @sei-ccohen has put a lot of work into parsing the RTTI data from VC and we use this to locate tables, find inheritance relationships, and so on.
I have not tried the new functionality in Ghidra, but I will do that now. We may have to think about how to integrate OOAnalyzer's results from Ghidra's new built-in RTTI analysis.
RecoverClassesFromRTTIScript.java> Running...
RecoverClassesFromRTTIScript.java> Checking for missing RTTI information and undefined constructor/destructor functions and creating if possible to find entry point...
RecoverClassesFromRTTIScript.java> analyzing program changes ...
RecoverClassesFromRTTIScript.java> Recovering classes using RTTI...
RecoverClassesFromRTTIScript.java> Identified 26 classes to process and 49 class member functions to assign.
RecoverClassesFromRTTIScript.java> See Bookmark Manager for a list of functions by type.
RecoverClassesFromRTTIScript.java> Total number of constructors: 26
RecoverClassesFromRTTIScript.java> Total number of inlined constructors: 2
RecoverClassesFromRTTIScript.java> Total number of destructors: 18
RecoverClassesFromRTTIScript.java> Total number of inlined destructors: 9
RecoverClassesFromRTTIScript.java> Total number of virtual functions: 112
RecoverClassesFromRTTIScript.java> Total number of virtual functions that are deleting destructors: 26
RecoverClassesFromRTTIScript.java> Total number of virtual functions that are clone functions: 0
RecoverClassesFromRTTIScript.java> Total number of virtual functions that are vbase_destructors: 0
RecoverClassesFromRTTIScript.java> Total number of indetermined constructor/destructors: 3
RecoverClassesFromRTTIScript.java> Finished!
Looking at the script more, they are doing a lot of analysis in addition to parsing the RTTI information. It would be interesting to compare the conclusions that OOAnalyzer makes to the script.
@sei-ccohen and I have talked about using Ghidra to produce initial facts for OOAnalyzer. As you noted, there are capabilities in Ghidra that would be helpful and do not currently have a corresponding implementation in Pharos/ROSE.
I don't currently believe RTTI is one of those areas, though. @sei-ccohen has put a lot of work into parsing the RTTI data from VC and we use this to locate tables, find inheritance relationships, and so on.
I have not tried the new functionality in Ghidra, but I will do that now. We may have to think about how to integrate OOAnalyzer's results from Ghidra's new built-in RTTI analysis.
Thanks for the feedback! That was also my original intention, that you compare it with each other. I can imagine that you could discover one or two interesting things. I didn't mean to attack your work in any way, and I appreciate it very much.
Was just a suggestion the new possibilities should be used only if it can still improve the overall result.
No offense taken :) I was not sure if you were aware that we used RTTI.
I will put it on my todo list to do some sort of comparison. It's hard to say otherwise if there are opportunities we are missing.
I did some analysis and here is a list of constructors that Ghidra outputs and are actually constructors according to our ground truth, and that OOAnalyzer did not report as constructors.
I just spot checked three of these and they did not appear in the results file at all. I have to regenerate the facts file and then I will see if they are referenced there.
The majority of the constructors have no facts at all :(
~/D/ghidra-rtti-comparison (master|…) $ for line in (cat mysqlpump.ghidra-only.constructors); echo $line; fgrep -R $line *.facts >/dev/null; or echo no facts; end
0x40e8b0
no facts
0x40eb10
no facts
0x4429b0
0x447970
0x44fea0
no facts
0x488200
no facts
0x488230
no facts
0x488260
no facts
0x488290
no facts
0x494ec0
no facts
0x494f30
no facts
0x495030
no facts
0x4950f0
no facts
0x495150
0x495170
no facts
0x495320
no facts
0x4953e0
no facts
0x49b410
no facts
0x49b440
no facts
0x49b470
no facts
0x49b4a0
no facts
0x49fbd0
no facts
0x4a6250
no facts
0x4a6280
no facts
0x495150 and 0x447970 don't have possibleConstructor facts. And I think this propagates into 0x4429b0. So these are all fact generation problems :-(
@sei-ccohen pointed out that the internal scripts we were using to process mysqlpump had --partition=rose
, which is probably why we missed a bunch of those methods completely.
Edit: (Completely unrelated to this issue!)
Oops. I put the Jan Gray note in the wrong issue. It's completely unrelated to this.