ghidra-scripts icon indicating copy to clipboard operation
ghidra-scripts copied to clipboard

Suggested improvements to Haruspex: headless mode, remove problematic decorators, rename to .cpp where appropriate

Open 0xdea opened this issue 2 months ago • 1 comments

Suggested improvements from https://cc-sw.com/semgrep-guide-for-a-security-engineer-part-5-of-6/:

If you want a more streamlined method than running the Ghidra Scripts GUI for each binary, I would recommend modifying the script to take the export folder location as an argument instead:

@Override
public void run() throws Exception
{
  printf("\nHaruspex.java - Extract Ghidra decompiler's pseudo-code\n");
  printf("Copyright (c) 2022 Marco Ivaldi <[email protected]>\n\n");
   
  //Change: Use CLI argument instead of askXxx() method
  String[] args = getScriptArgs();
  outputPath = args[0];
   
  //Original: ask for output directory path
  //try {
  //    outputPath = askString("Output directory path", "Enter the path of the output directory:");
  //} catch (Exception e) {
  //    printf("Output directory not supplied, using default \"%s\".\n", outputPath);
  //}
  ...
}

Then using Ghidra Headless, we can export decompiled code for analyzed binaries from the command line: /opt/ghidra/support/analyzeHeadless <GHIDRA_PROJECT_LOCATION> <GHIDRA_PROJECT_NAME> -process <BINARY_NAME> -scriptPath <GHIDRA_HARUSPEX_SCRIPT_LOCATION> -postScript Haruspex.java <OUTPUT_FOLDER_PATH>

Ghidra has some decorators which we want to remove from our code files. Specifically, I have a Python script that looks for the following strings and delete them [check how to do this via the decompiler API]: __thiscall __cdecl __noreturn __fastcall

I will also rename the files to either .c or .cpp as appropriate [to improve handling by Semgrep].

0xdea avatar Oct 31 '25 08:10 0xdea

It might be impractical to act at the PCode/pseudocode layer using the Ghidra API to remove annotations that confuse Semgrep, although I still have to investigate this more deeply.

The most pragmatic thing to do is likely to build a cleanup script to apply to the pseudocode before scanning it with Semgrep (see https://github.com/0xdea/semgrep-rules/issues/12).

Finally, I must investigate if Ghidra generates C++-like pseudocode (see https://dogbolt.org/?id=2ccdcce2-7fb8-4e20-96f9-2fe54365c9fb for instance) to determine if there's indeed the need to rename decompiled files to .cpp in some cases.

0xdea avatar Nov 14 '25 08:11 0xdea