joern Question about using Joern API

I'm now writing an application on top of Joern. I want to get intermediate representations including CFG, CDG, DDG. Part of the application code is:

import io.joern.c2cpg.{C2Cpg, Config}
import io.joern.c2cpg.passes.AstCreationPass
import io.joern.x2cpg.X2Cpg.newEmptyCpg
import io.joern.dataflowengineoss.layers.dataflows.{OssDataFlow, OssDataFlowOptions}
import io.joern.x2cpg.layers.{Base, CallGraph, ControlFlow, TypeRelations}
import io.shiftleft.codepropertygraph.Cpg
import io.shiftleft.codepropertygraph.generated.Cpg
import io.shiftleft.codepropertygraph.generated.edges.{ReachingDef, Ref}
import io.shiftleft.semanticcpg.layers.LayerCreatorContext
import overflowdb.Edge


val inputDir: String = "src/test/testdatas/testcase1"
val outputPath: String = inputDir + "/AST.txt"
val testFile: String = "test.c"
val config: Config = Config(inputPath = inputDir, outputPath = outputPath)

val c2cpg: C2Cpg = new C2Cpg()
val cpg: Cpg = c2cpg.createCpg(config).get

val context = new LayerCreatorContext(cpg)
new Base().run(context)
new TypeRelations().run(context)
new ControlFlow().run(context)
new CallGraph().run(context)
val options = new OssDataFlowOptions()
new OssDataFlow(options).run(context)

And I print the CDG, DDG edges like

cpg.graph.edges().forEachRemaining((edge: Edge) => {
val label: String = edge.label()
      label match {
        // control dependence
        case "CDG" =>
          val cdEdge: Cdg = edge.asInstanceOf[Cdg]
          val srcCode = cdEdge.outNode().property("CODE", "<empty>") // src结点代码
          val dstCode = cdEdge.inNode().property("CODE", "<empty>") // dst结点代码
          println("control dependence: " + srcCode + " ------> " + dstCode)
        
        // data dependence
        case "REACHING_DEF" =>
          val dfEdge: ReachingDef = edge.asInstanceOf[ReachingDef]
          val srcCode = dfEdge.outNode().property("CODE", "<empty>") // src结点代码
          val dstCode = dfEdge.inNode().property("CODE", "<empty>") // dst结点代码
          val srcType = dfEdge.outNode().getClass.getSimpleName
          val dstType = dfEdge.inNode().getClass.getSimpleName
          val srcLine = dfEdge.outNode().property("LINE_NUMBER", -1)
          val dstLine = dfEdge.inNode().property("LINE_NUMBER", -1)

          val variable: String = dfEdge.variable
          println(f"data flow edge about ${variable}: (${srcType}, ${srcLine}, ${srcCode})" +
            f" ------> (${dstType}, ${dstLine}, ${dstCode})")
        case _ =>
            println(label)
      }
})

But I found the output PDG of Joern-Parse with my application is quite different. The one produced by Joern-Parse is much more fine-grained. For example, given following testcode

#include<stdio.h>

int func(int a, int b) {
    int c = a + b, d;
    if (c == 1)
        d = 0;
    else
        d = 1;
    return d;
}

int main() {
   int a, b;
   scanf("%d, %d", &a, &b);
   int d = a + b;

   while (true);
   return 0;
}

In the pdg output by Joern-Parse for func functions:

my application parse an DDG edge (Method, int func (int a,int b)) ------> (Identifier, a), where Method and Identifier are node types, int func (int a,int b) and a are corresponding codes. This edge did not appear in the dot file output by Joern-Parse and Joern-Export.

There are more cases like that, how could I make the output CPG more fine-grained using Joern API?

Mar 22 '23 11:03 for-just-we

@for-just-we have you sorted this out?

May 16 '24 01:05 ramsey-coding

Can Joern actually do this without spending hours and hours on it? Can you point to any document about how to use this tool correctly?

May 16 '24 01:05 ramsey-coding

joern joern copied to clipboard

Question about using Joern API

joern
joern copied to clipboard