joern icon indicating copy to clipboard operation
joern copied to clipboard

How to get line information and file path of source code after processing?

Open for-just-we opened this issue 3 years ago • 1 comments

Hello, author

After I use command joern-parse /src/directory and joern-export --repr cpg14 --out outdir to dump all cpgs of source codes in /src/directory to outdir. For each .dot file, it's text is like

digraph "main" {  
"3074457345618258674" [label = "(METHOD,main)" ]
"3074457345618258700" [label = "(METHOD_RETURN,int)" ]
"3074457345618258675" [label = "(PARAM,int argc)" ]
"3074457345618258676" [label = "(PARAM,char *argv[])" ]
"3074457345618258696" [label = "(printf,printf(\"What is the meaning of life?\n\"))" ]
"3074457345618258698" [label = "(exit,exit(0))" ]
"3074457345618258679" [label = "(<operator>.logicalAnd,argc > 1 && strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258691" [label = "(fprintf,fprintf(stderr, \"It depends!\n\"))" ]
"3074457345618258694" [label = "(exit,exit(42))" ]
"3074457345618258680" [label = "(<operator>.greaterThan,argc > 1)" ]
"3074457345618258683" [label = "(<operator>.equals,strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258684" [label = "(strcmp,strcmp(argv[1], \"42\"))" ]
"3074457345618258685" [label = "(<operator>.indirectIndexAccess,argv[1])" ]
  "3074457345618258675" -> "3074457345618258700"  [ label = "DDG: argc"] 
  "3074457345618258676" -> "3074457345618258700"  [ label = "DDG: argv"] 
  "3074457345618258680" -> "3074457345618258700"  [ label = "DDG: argc"] 
  "3074457345618258679" -> "3074457345618258700"  [ label = "DDG: argc > 1"] 
  "3074457345618258684" -> "3074457345618258700"  [ label = "DDG: argv[1]"] 
  "3074457345618258683" -> "3074457345618258700"  [ label = "DDG: strcmp(argv[1], \"42\")"] 
  "3074457345618258679" -> "3074457345618258700"  [ label = "DDG: strcmp(argv[1], \"42\") == 0"] 
  "3074457345618258679" -> "3074457345618258700"  [ label = "DDG: argc > 1 && strcmp(argv[1], \"42\") == 0"] 
  "3074457345618258691" -> "3074457345618258700"  [ label = "DDG: fprintf(stderr, \"It depends!\n\")"] 
  "3074457345618258694" -> "3074457345618258700"  [ label = "DDG: exit(42)"] 
  "3074457345618258696" -> "3074457345618258700"  [ label = "DDG: printf(\"What is the meaning of life?\n\")"] 
  "3074457345618258698" -> "3074457345618258700"  [ label = "DDG: exit(0)"] 
  "3074457345618258691" -> "3074457345618258700"  [ label = "DDG: stderr"] 
  "3074457345618258674" -> "3074457345618258675"  [ label = "DDG: "] 
  "3074457345618258674" -> "3074457345618258676"  [ label = "DDG: "] 
  "3074457345618258674" -> "3074457345618258696"  [ label = "DDG: "] 
  "3074457345618258674" -> "3074457345618258698"  [ label = "DDG: "] 
  "3074457345618258680" -> "3074457345618258679"  [ label = "DDG: 1"] 
  "3074457345618258680" -> "3074457345618258679"  [ label = "DDG: argc"] 
  "3074457345618258683" -> "3074457345618258679"  [ label = "DDG: 0"] 
  "3074457345618258683" -> "3074457345618258679"  [ label = "DDG: strcmp(argv[1], \"42\")"] 
  "3074457345618258675" -> "3074457345618258680"  [ label = "DDG: argc"] 
  "3074457345618258674" -> "3074457345618258680"  [ label = "DDG: "] 
  "3074457345618258676" -> "3074457345618258683"  [ label = "DDG: argv"] 
  "3074457345618258674" -> "3074457345618258683"  [ label = "DDG: "] 
  "3074457345618258674" -> "3074457345618258691"  [ label = "DDG: "] 
  "3074457345618258674" -> "3074457345618258694"  [ label = "DDG: "] 
  "3074457345618258676" -> "3074457345618258684"  [ label = "DDG: argv"] 
  "3074457345618258674" -> "3074457345618258684"  [ label = "DDG: "] 
  "3074457345618258679" -> "3074457345618258694"  [ label = "CDG: "] 
  "3074457345618258679" -> "3074457345618258691"  [ label = "CDG: "] 
  "3074457345618258680" -> "3074457345618258685"  [ label = "CDG: "] 
  "3074457345618258680" -> "3074457345618258683"  [ label = "CDG: "] 
  "3074457345618258680" -> "3074457345618258684"  [ label = "CDG: "] 
}

So I wonder given a .dot file. Could I get the source code file path corresponding to the dot file, and line number corresponding to each node in the CPG? Because the line number infomation and file path are quite important to my task.

for-just-we avatar Dec 09 '21 09:12 for-just-we

This is currently not directly supported (as an API call or something similar).

The actual implementation for the String representation of nodes for printing to .dot is here: https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L41

The file name of the current method (method.file.name) you want to print the .dot file for could be retrieved here: https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L33

So you could adapt this particular lines of code there, build codepropertygraph locally (sbt publishLocal) and use that version to build joern locally as well.

max-leuthaeuser avatar Dec 09 '21 15:12 max-leuthaeuser

The dot generation was rewritten a few times, the issue could be outdated. I am closing it for now, please re-open it, if you still have questions.

itsacoderepo avatar Jan 15 '23 21:01 itsacoderepo