[ghidra2cpg] Add support for loading Ghidra projects
Fixes #2534
Added support for loading existing Ghidra projects. The linked issue describes why this is a useful feature. If the input file is a non-empty Ghidra project (.gpr), we load the first program (domain file) from it and create the CPG.
Test binary:
Edit:
Note: First I tried to upload the .gpr file but turns out Ghidra projects can not be easily shared in the .gpr + .rep format because such a project is locked to a username (to the user who created it). For testing, a project has to be manually created in Ghidra and the test binary (after unzip) has to be manually imported and auto-analyzed. Then you need to save the project and Ghidra needs to be closed completely. Otherwise it will lock the project via a lockfile and joern-parse will fail.
Test run:
$ joern-parse ~/hello-arm64.gpr --language GHIDRA
Parsing code at: /home/gemesa/hello-arm64.gpr - language: `GHIDRA`
[+] Running language frontend
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /home/gemesa/git-repos/joern/joern-cli/target/universal/stage/ghidra2cpg -J-Xmx3472m /home/gemesa/hello-arm64.gpr --output cpg.bin
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================
[+] Applying default overlays
Successfully wrote graph to: /home/gemesa/git-repos/tmp/cpg.bin
To load the graph, type `joern /home/gemesa/git-repos/tmp/cpg.bin`
$ joern /home/gemesa/git-repos/tmp/cpg.bin
Creating project `cpg.bin` for CPG at `/home/gemesa/git-repos/tmp/cpg.bin`
Project with name cpg.bin already exists - overwriting
Creating working copy of CPG to be safe
Loading base CPG from: /home/gemesa/git-repos/tmp/workspace/cpg.bin/cpg.bin.tmp
Overlay dataflowOss already exists - skipping
The graph has been modified. You may want to use the `save` command to persist changes to disk. All changes will also be saved collectively on exit
██╗ ██████╗ ███████╗██████╗ ███╗ ██╗
██║██╔═══██╗██╔════╝██╔══██╗████╗ ██║
██║██║ ██║█████╗ ██████╔╝██╔██╗ ██║
██ ██║██║ ██║██╔══╝ ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║ ██║██║ ╚████║
╚════╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝
Version: 0.0.0+3813-9478888c
Type `help` to begin
joern> cpg.method("main").code.l
val res1: List[String] = List(
"""
/* WARNING: Unknown calling convention -- yet parameter storage is locked */
int main(void)
{
puts("hello world");
return 0;
}
"""
)
joern>
Still, thanks you for the PR!
Any update on this PR @itsacoderepo?
I have been using these changes for a while now and think others would find them valuable too. joern currently cant load ghidra projects where analysts have manually improved the decompiled code (fixed types, calling conventions, etc.).
Let me know if you need any clarification or changes. Thanks!