ghidra
ghidra copied to clipboard
RTTI RecoverClassesFromRTTIScript optimization and a bug
Hello. I was running the RecoverClassesFromRTTIScript.java script over a program with several thousand identified classes, and I have some notes on how that may be improved.
TL;DR:
- There is a code path that can be improved, performance wise
- There is a code path that is quite expensive and I'm not sure how that can be improved.
- There is a code part that can result in an "infinite" loop, but I'm not sure how that is triggered.
Let's begin.
First note
The function processDeletingDestructor() at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6520 goes on and calls
https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6529 .
Since the processDeletingDestructor() is called multiple times inside a nested loop at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6893 , it is important for it to be efficient. However the getAllClassFunctionsWithVtableRef() is expensive each time, so it would be nice if it was not there at all. This is easy, as it does not produce different results each time (correct me if I'm wrong) and can be called only once in the beginning. Even better, it is actually called at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6842
So, the allFunctionsThatRefVftables variable can be provided as a parameter to processDeletingDestructor() and it can be queried efficiently, each time.
This optimization, in the program I was testing, reduced the related segment from upwards of half an hour to some seconds, in processing time.
Second note
Now regarding the second part, I'm not sure what can be done about it. It required several hours in my analysis, because it seems to trigger an entire re-decompilation phase.
The problem lies inside the function runFillOutStructureHelper() at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6046 . It calls
https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RecoveredClassHelper.java#L6056
This can take hours as it seems to trigger an entire re-decompilation phase. I don't know what to do with that and whether it can be skipped.
Third note
Now, the infinite loop. In case of Windows, the function extendedFlatAPI.createShortenedTemplateNamesForClasses(recoveredClasses) at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/RTTIWindowsClassRecoverer.java#L173C1-L173C77 goes into the while loop at https://github.com/NationalSecurityAgency/ghidra/blob/65b290624ece634e3dc8a228a7cf9715f75ecd9f/Ghidra/Features/Decompiler/ghidra_scripts/classrecovery/ExtendedFlatProgramAPI.java#L1285-L1329
I noticed, but can't figure out why, that in the program I was analyzing the commaIndex variable kept increasing to infinity. I solved it by breaking out of the loop when it reached values larger that 30.
Anyway, that's all. The problem is that I don't know whether my suggestions are semantically correct. They caused the script to be faster and to "work" , but do they work every time? I don't know for sure. Thanks a lot, and apologies for my lack of certainty.
Environment
Ghidra 11.3.2 release
@CarrotCultivator Thanks for pointing out how the script can be improved. I'll take a look at 1 and 3 and see what can be done about them. I already know that 2 is an issue and have been exploring ways to speed it up. If you are running on Windows the upcoming 11.4 windows build contains a decompiler that has been compiled to run a bit faster so that may help. You could also try to change your decompiler timeout settings to a shorter time and see if that helps. I could add an option to disable using this but you would then miss out on the class structure building that using the decompiler would get you.
In my case the script goes into an infinite loop, because classesWithSameShortenedName contains two classes with identical names. These classes are in different namespaces, so it is perfectly legal, but the code responsible for shortening names considers them as identical even when comparing full names. There is no check if the call getNewShortenedTemplateName(currentClassWithSameShortName, commaIndex) returns an actually shortened name or the full one, so at some point incrementing commaIndex does nothing, because it is already greater than the number of commas separating template arguments.
I think this could be solved by considering namespaces when checking name uniqueness and/or checking if the new shortened name is not already the full name.
In my case the script goes into an infinite loop, because
classesWithSameShortenedNamecontains two classes with identical names. These classes are in different namespaces, so it is perfectly legal, but the code responsible for shortening names considers them as identical even when comparing full names. There is no check if the callgetNewShortenedTemplateName(currentClassWithSameShortName, commaIndex)returns an actually shortened name or the full one, so at some point incrementingcommaIndexdoes nothing, because it is already greater than the number of commas separating template arguments.I think this could be solved by considering namespaces when checking name uniqueness and/or checking if the new shortened name is not already the full name.
Thanks for the detailed info.
@CarrotCultivator Part of this was completed. Reopened ticket until the rest are completed.