joern
joern copied to clipboard
How to get line information and file path of source code after processing?
Hello, author
After I use command joern-parse /src/directory
and joern-export --repr cpg14 --out outdir
to dump all cpgs of source codes in /src/directory
to outdir
. For each .dot
file, it's text is like
digraph "main" {
"3074457345618258674" [label = "(METHOD,main)" ]
"3074457345618258700" [label = "(METHOD_RETURN,int)" ]
"3074457345618258675" [label = "(PARAM,int argc)" ]
"3074457345618258676" [label = "(PARAM,char *argv[])" ]
"3074457345618258696" [label = "(printf,printf(\"What is the meaning of life?\n\"))" ]
"3074457345618258698" [label = "(exit,exit(0))" ]
"3074457345618258679" [label = "(<operator>.logicalAnd,argc > 1 && strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258691" [label = "(fprintf,fprintf(stderr, \"It depends!\n\"))" ]
"3074457345618258694" [label = "(exit,exit(42))" ]
"3074457345618258680" [label = "(<operator>.greaterThan,argc > 1)" ]
"3074457345618258683" [label = "(<operator>.equals,strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258684" [label = "(strcmp,strcmp(argv[1], \"42\"))" ]
"3074457345618258685" [label = "(<operator>.indirectIndexAccess,argv[1])" ]
"3074457345618258675" -> "3074457345618258700" [ label = "DDG: argc"]
"3074457345618258676" -> "3074457345618258700" [ label = "DDG: argv"]
"3074457345618258680" -> "3074457345618258700" [ label = "DDG: argc"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: argc > 1"]
"3074457345618258684" -> "3074457345618258700" [ label = "DDG: argv[1]"]
"3074457345618258683" -> "3074457345618258700" [ label = "DDG: strcmp(argv[1], \"42\")"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: strcmp(argv[1], \"42\") == 0"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: argc > 1 && strcmp(argv[1], \"42\") == 0"]
"3074457345618258691" -> "3074457345618258700" [ label = "DDG: fprintf(stderr, \"It depends!\n\")"]
"3074457345618258694" -> "3074457345618258700" [ label = "DDG: exit(42)"]
"3074457345618258696" -> "3074457345618258700" [ label = "DDG: printf(\"What is the meaning of life?\n\")"]
"3074457345618258698" -> "3074457345618258700" [ label = "DDG: exit(0)"]
"3074457345618258691" -> "3074457345618258700" [ label = "DDG: stderr"]
"3074457345618258674" -> "3074457345618258675" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258676" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258696" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258698" [ label = "DDG: "]
"3074457345618258680" -> "3074457345618258679" [ label = "DDG: 1"]
"3074457345618258680" -> "3074457345618258679" [ label = "DDG: argc"]
"3074457345618258683" -> "3074457345618258679" [ label = "DDG: 0"]
"3074457345618258683" -> "3074457345618258679" [ label = "DDG: strcmp(argv[1], \"42\")"]
"3074457345618258675" -> "3074457345618258680" [ label = "DDG: argc"]
"3074457345618258674" -> "3074457345618258680" [ label = "DDG: "]
"3074457345618258676" -> "3074457345618258683" [ label = "DDG: argv"]
"3074457345618258674" -> "3074457345618258683" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258691" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258694" [ label = "DDG: "]
"3074457345618258676" -> "3074457345618258684" [ label = "DDG: argv"]
"3074457345618258674" -> "3074457345618258684" [ label = "DDG: "]
"3074457345618258679" -> "3074457345618258694" [ label = "CDG: "]
"3074457345618258679" -> "3074457345618258691" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258685" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258683" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258684" [ label = "CDG: "]
}
So I wonder given a .dot
file. Could I get the source code file path corresponding to the dot file, and line number corresponding to each node in the CPG? Because the line number infomation and file path are quite important to my task.
This is currently not directly supported (as an API call or something similar).
The actual implementation for the String representation of nodes for printing to .dot is here: https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L41
The file name of the current method (method.file.name
) you want to print the .dot file for could be retrieved here:
https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L33
So you could adapt this particular lines of code there, build codepropertygraph
locally (sbt publishLocal
) and use that version to build joern
locally as well.
The dot generation was rewritten a few times, the issue could be outdated. I am closing it for now, please re-open it, if you still have questions.