How to get line information and file path of source code after processing?
Hello, author
After I use command joern-parse /src/directory and joern-export --repr cpg14 --out outdir to dump all cpgs of source codes in /src/directory to outdir. For each .dot file, it's text is like
digraph "main" {
"3074457345618258674" [label = "(METHOD,main)" ]
"3074457345618258700" [label = "(METHOD_RETURN,int)" ]
"3074457345618258675" [label = "(PARAM,int argc)" ]
"3074457345618258676" [label = "(PARAM,char *argv[])" ]
"3074457345618258696" [label = "(printf,printf(\"What is the meaning of life?\n\"))" ]
"3074457345618258698" [label = "(exit,exit(0))" ]
"3074457345618258679" [label = "(<operator>.logicalAnd,argc > 1 && strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258691" [label = "(fprintf,fprintf(stderr, \"It depends!\n\"))" ]
"3074457345618258694" [label = "(exit,exit(42))" ]
"3074457345618258680" [label = "(<operator>.greaterThan,argc > 1)" ]
"3074457345618258683" [label = "(<operator>.equals,strcmp(argv[1], \"42\") == 0)" ]
"3074457345618258684" [label = "(strcmp,strcmp(argv[1], \"42\"))" ]
"3074457345618258685" [label = "(<operator>.indirectIndexAccess,argv[1])" ]
"3074457345618258675" -> "3074457345618258700" [ label = "DDG: argc"]
"3074457345618258676" -> "3074457345618258700" [ label = "DDG: argv"]
"3074457345618258680" -> "3074457345618258700" [ label = "DDG: argc"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: argc > 1"]
"3074457345618258684" -> "3074457345618258700" [ label = "DDG: argv[1]"]
"3074457345618258683" -> "3074457345618258700" [ label = "DDG: strcmp(argv[1], \"42\")"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: strcmp(argv[1], \"42\") == 0"]
"3074457345618258679" -> "3074457345618258700" [ label = "DDG: argc > 1 && strcmp(argv[1], \"42\") == 0"]
"3074457345618258691" -> "3074457345618258700" [ label = "DDG: fprintf(stderr, \"It depends!\n\")"]
"3074457345618258694" -> "3074457345618258700" [ label = "DDG: exit(42)"]
"3074457345618258696" -> "3074457345618258700" [ label = "DDG: printf(\"What is the meaning of life?\n\")"]
"3074457345618258698" -> "3074457345618258700" [ label = "DDG: exit(0)"]
"3074457345618258691" -> "3074457345618258700" [ label = "DDG: stderr"]
"3074457345618258674" -> "3074457345618258675" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258676" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258696" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258698" [ label = "DDG: "]
"3074457345618258680" -> "3074457345618258679" [ label = "DDG: 1"]
"3074457345618258680" -> "3074457345618258679" [ label = "DDG: argc"]
"3074457345618258683" -> "3074457345618258679" [ label = "DDG: 0"]
"3074457345618258683" -> "3074457345618258679" [ label = "DDG: strcmp(argv[1], \"42\")"]
"3074457345618258675" -> "3074457345618258680" [ label = "DDG: argc"]
"3074457345618258674" -> "3074457345618258680" [ label = "DDG: "]
"3074457345618258676" -> "3074457345618258683" [ label = "DDG: argv"]
"3074457345618258674" -> "3074457345618258683" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258691" [ label = "DDG: "]
"3074457345618258674" -> "3074457345618258694" [ label = "DDG: "]
"3074457345618258676" -> "3074457345618258684" [ label = "DDG: argv"]
"3074457345618258674" -> "3074457345618258684" [ label = "DDG: "]
"3074457345618258679" -> "3074457345618258694" [ label = "CDG: "]
"3074457345618258679" -> "3074457345618258691" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258685" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258683" [ label = "CDG: "]
"3074457345618258680" -> "3074457345618258684" [ label = "CDG: "]
}
So I wonder given a .dot file. Could I get the source code file path corresponding to the dot file, and line number corresponding to each node in the CPG? Because the line number infomation and file path are quite important to my task.
This is currently not directly supported (as an API call or something similar).
The actual implementation for the String representation of nodes for printing to .dot is here: https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L41
The file name of the current method (method.file.name) you want to print the .dot file for could be retrieved here:
https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/semanticcpg/src/main/scala/io/shiftleft/semanticcpg/dotgenerator/DotSerializer.scala#L33
So you could adapt this particular lines of code there, build codepropertygraph locally (sbt publishLocal) and use that version to build joern locally as well.
The dot generation was rewritten a few times, the issue could be outdated. I am closing it for now, please re-open it, if you still have questions.