joern
joern copied to clipboard
Question about using Joern API
I'm now writing an application on top of Joern. I want to get intermediate representations including CFG, CDG, DDG. Part of the application code is:
import io.joern.c2cpg.{C2Cpg, Config}
import io.joern.c2cpg.passes.AstCreationPass
import io.joern.x2cpg.X2Cpg.newEmptyCpg
import io.joern.dataflowengineoss.layers.dataflows.{OssDataFlow, OssDataFlowOptions}
import io.joern.x2cpg.layers.{Base, CallGraph, ControlFlow, TypeRelations}
import io.shiftleft.codepropertygraph.Cpg
import io.shiftleft.codepropertygraph.generated.Cpg
import io.shiftleft.codepropertygraph.generated.edges.{ReachingDef, Ref}
import io.shiftleft.semanticcpg.layers.LayerCreatorContext
import overflowdb.Edge
val inputDir: String = "src/test/testdatas/testcase1"
val outputPath: String = inputDir + "/AST.txt"
val testFile: String = "test.c"
val config: Config = Config(inputPath = inputDir, outputPath = outputPath)
val c2cpg: C2Cpg = new C2Cpg()
val cpg: Cpg = c2cpg.createCpg(config).get
val context = new LayerCreatorContext(cpg)
new Base().run(context)
new TypeRelations().run(context)
new ControlFlow().run(context)
new CallGraph().run(context)
val options = new OssDataFlowOptions()
new OssDataFlow(options).run(context)
And I print the CDG, DDG edges like
cpg.graph.edges().forEachRemaining((edge: Edge) => {
val label: String = edge.label()
label match {
// control dependence
case "CDG" =>
val cdEdge: Cdg = edge.asInstanceOf[Cdg]
val srcCode = cdEdge.outNode().property("CODE", "<empty>") // src结点代码
val dstCode = cdEdge.inNode().property("CODE", "<empty>") // dst结点代码
println("control dependence: " + srcCode + " ------> " + dstCode)
// data dependence
case "REACHING_DEF" =>
val dfEdge: ReachingDef = edge.asInstanceOf[ReachingDef]
val srcCode = dfEdge.outNode().property("CODE", "<empty>") // src结点代码
val dstCode = dfEdge.inNode().property("CODE", "<empty>") // dst结点代码
val srcType = dfEdge.outNode().getClass.getSimpleName
val dstType = dfEdge.inNode().getClass.getSimpleName
val srcLine = dfEdge.outNode().property("LINE_NUMBER", -1)
val dstLine = dfEdge.inNode().property("LINE_NUMBER", -1)
val variable: String = dfEdge.variable
println(f"data flow edge about ${variable}: (${srcType}, ${srcLine}, ${srcCode})" +
f" ------> (${dstType}, ${dstLine}, ${dstCode})")
case _ =>
println(label)
}
})
But I found the output PDG of Joern-Parse with my application is quite different. The one produced by Joern-Parse is much more fine-grained. For example, given following testcode
#include<stdio.h>
int func(int a, int b) {
int c = a + b, d;
if (c == 1)
d = 0;
else
d = 1;
return d;
}
int main() {
int a, b;
scanf("%d, %d", &a, &b);
int d = a + b;
while (true);
return 0;
}
In the pdg output by Joern-Parse for func functions:
- my application parse an DDG edge
(Method, int func (int a,int b)) ------> (Identifier, a), whereMethodandIdentifierare node types,int func (int a,int b)andaare corresponding codes. This edge did not appear in the dot file output by Joern-Parse and Joern-Export.
There are more cases like that, how could I make the output CPG more fine-grained using Joern API?
@for-just-we have you sorted this out?
Can Joern actually do this without spending hours and hours on it? Can you point to any document about how to use this tool correctly?