joern icon indicating copy to clipboard operation
joern copied to clipboard

[ghidra2cpg] Add support for loading Ghidra projects

Open gemesa opened this issue 8 months ago • 3 comments

Fixes #2534

Added support for loading existing Ghidra projects. The linked issue describes why this is a useful feature. If the input file is a non-empty Ghidra project (.gpr), we load the first program (domain file) from it and create the CPG.

Test binary:

hello-arm64.zip

Edit: Note: First I tried to upload the .gpr file but turns out Ghidra projects can not be easily shared in the .gpr + .rep format because such a project is locked to a username (to the user who created it). For testing, a project has to be manually created in Ghidra and the test binary (after unzip) has to be manually imported and auto-analyzed. Then you need to save the project and Ghidra needs to be closed completely. Otherwise it will lock the project via a lockfile and joern-parse will fail.

Test run:

$ joern-parse ~/hello-arm64.gpr --language GHIDRA
Parsing code at: /home/gemesa/hello-arm64.gpr - language: `GHIDRA`
[+] Running language frontend
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /home/gemesa/git-repos/joern/joern-cli/target/universal/stage/ghidra2cpg -J-Xmx3472m /home/gemesa/hello-arm64.gpr --output cpg.bin
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================

[+] Applying default overlays
Successfully wrote graph to: /home/gemesa/git-repos/tmp/cpg.bin
To load the graph, type `joern /home/gemesa/git-repos/tmp/cpg.bin`
                                                                                                                      
$ joern /home/gemesa/git-repos/tmp/cpg.bin
Creating project `cpg.bin` for CPG at `/home/gemesa/git-repos/tmp/cpg.bin`
Project with name cpg.bin already exists - overwriting
Creating working copy of CPG to be safe
Loading base CPG from: /home/gemesa/git-repos/tmp/workspace/cpg.bin/cpg.bin.tmp
Overlay dataflowOss already exists - skipping
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit

     ██╗ ██████╗ ███████╗██████╗ ███╗   ██╗
     ██║██╔═══██╗██╔════╝██╔══██╗████╗  ██║
     ██║██║   ██║█████╗  ██████╔╝██╔██╗ ██║
██   ██║██║   ██║██╔══╝  ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║  ██║██║ ╚████║
 ╚════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝
Version: 0.0.0+3813-9478888c
Type `help` to begin
      
                                                                                                                      
joern> cpg.method("main").code.l
val res1: List[String] = List(
  """
/* WARNING: Unknown calling convention -- yet parameter storage is locked */

int main(void)

{
  puts("hello world");
  return 0;
}

"""
)
                                                                                                                      
joern>

gemesa avatar May 02 '25 14:05 gemesa

Still, thanks you for the PR!

itsacoderepo avatar May 02 '25 16:05 itsacoderepo

Any update on this PR @itsacoderepo?

max-leuthaeuser avatar May 27 '25 14:05 max-leuthaeuser

I have been using these changes for a while now and think others would find them valuable too. joern currently cant load ghidra projects where analysts have manually improved the decompiled code (fixed types, calling conventions, etc.).

Let me know if you need any clarification or changes. Thanks!

gemesa avatar Aug 14 '25 18:08 gemesa