ghidra
ghidra copied to clipboard
Improve function argument type inference with runtime information.
Currently, Ghidra uses nothing but static analysis to decompile executables.
However, the assembly is not the only source of valuable information. The other source is the runtime of the executable itself.
As an example, records of function calls could be used to improve the inference of argument types of each function.
Analysis would then have 3 phases:
- Run the Static analysis.
- Run the executable in debug mode, play with it a little while Ghidra gathers the data.
- Make Ghidra use this data to improve the decompiled pseudocode.
A good bit of framework for this is already in place, specifically our trace database, but we have not yet gotten to automating this sort of analysis ourselves. To my knowledge, two big pieces are missing:
- The means to record and/or import a "dense" trace. There are plenty of tools and formats out there that can do this, e.g., perfetto, Common Trace Format, WinDbg TTD. You can create "dense" traces in Ghidra by single-stepping its debugger, but that will get old very fast for more than a few hundred instructions :/ . Otherwise, the Debugger makes "sparse" traces during normal operation. Every pause is recorded. You might also get some mileage by breaking only at certain "critical points," and use the p-code emulator to interpolate later. One idea would be to let an existing tool perform the trace, then we could import that into a Ghidra trace (again, perhaps ignoring those parts that can be recovered via emulation). Scripts can already do this to some extent. See
PopulateDemoTrace,PopulateTraceLocal, andPopulateTraceRemote. But, it'd take development work to bring those scripts up to snuff and build out importers for other formats. Ideally, we'd eventually develop a "trace import" framework, but that'd take some time. - The "trace analyzers" that read observations in a trace and place the appropriate annotations in the corresponding static images (Ghidra program databases). In theory, this can currently be done with scripting. In particular, see
FlatDebuggerAPI#translateDynamicToStaticand its inverse for methods that would be helpful in correlating dynamic and static databases. While possible, it'd take a good bit of work to write such scripts. It would also be neat to build out a "trace analysis" framework, but again, we're not there.
In any case, the biggest pieces, the trace database (DBTrace) and a decent api (FlatDebuggerAPI) are there for you to experiment with toward your suggestion. If you do experiment, we'd like to hear how it goes.