joern
joern copied to clipboard
(C/C++) A design mistake makes it hard to recognize which argument makes CALL node connected by the REACHING_DEF edge
Here is the code:
int f(int a, int b, int c)
{
return g(a, b + 1, c);
}
And its ast is like: Its ddg is like:
Firstly, there should be an REACHING_DEF edge from b + 1
to g(a, b + 1, c)
, but there isn't. This should be fixed.
By traversing the sub-AST of CALL node, I can know that it has 3 arguments (a
,b + 1
,c
), the second one is of CALL type, while the others are of IDENTIFIER type.
Here is the attribute of the 3 nodes:
{
"dynamicTypeHintFullName":[
],
"name":"a",
"code":"a",
"typeFullName":"int",
"columnNumber":14,
"order":1,
"_label":"IDENTIFIER",
"argumentIndex":1,
"lineNumber":63,
"id":27
},
{
"name":"<operator>.addition",
"signature":"",
"code":"b + 1",
"typeFullName":"<empty>",
"columnNumber":17,
"order":2,
"methodFullName":"<operator>.addition",
"_label":"CALL",
"argumentIndex":2,
"dynamicTypeHintFullName":[
],
"dispatchType":"STATIC_DISPATCH",
"lineNumber":63,
"id":28
},
{
"dynamicTypeHintFullName":[
],
"name":"c",
"code":"c",
"typeFullName":"int",
"columnNumber":24,
"order":3,
"_label":"IDENTIFIER",
"argumentIndex":3,
"lineNumber":63,
"id":31
},
I can easily get the REACHING_DEF edge related to CALL node b + 1
, but for a
and c
, it is hard to distinguish them.
Maybe I am interested in argument a, but I can hardly know if there is a REACHING_DEF edge links to the CALL node because of a
.
It is able to solve the problem by using the information on REACHING_DEF edge, but in my case, I need to simplify the ddg so I don't want to preserve the information on edges at all.
Here is a better solution for that:
IDENTIFIER nodes don't exist in ddg, I think it is necessary to add two nodes respectively for a
and b
, the two nodes are of CALL type. Its method is a new one, representing using an identifier itself. These two nodes are in ddg now, and they can help me to get relationship between REACHING_DEF edges and CALL node arguments.
I found another way to solve this problem, we can add a new type of node named "ARGUMENT", every argument of "CALL" node has its corresponding "ARGUMENT" node, This kind of node will show in both AST and DDG. However, this may add too many nodes.
The two ways may need to make big changes, an easier way to solve this problem is to let "IDENTIFIER" node exist in DDG. In this case, just link (PARAM, a) to (IDENTIFIER, a) and link (IDENTIFIER, a) to (CALL, g(a, b + 1, c) ), not to link (PARAM, a) to (CALL, g(a, b + 1, c) ) directly. But this may break the consistance of ddg generating rules.
I think this is just an effect that is produced by the dot-plotting code, which aims to simplify the representation. In the underlying graph, we in fact create REACHING_DEF edges between arguments, so, I don't think you're seeing a design issue here. I'll look into the potentially missing reaching-def edge though.
I think this is just an effect that is produced by the dot-plotting code, which aims to simplify the representation. In the underlying graph, we in fact create REACHING_DEF edges between arguments, so, I don't think you're seeing a design issue here. I'll look into the potentially missing reaching-def edge though.
Could you please teach me how to obtain the underlying DDG?
I tried to use cpg.graph.E
, but I don't know how to filter the REACHING_DEF edges.
And I found that cpg.graph.E.map(e => List(e.inV, e.outV))
did not work:
So weired.
Are these output wrong? Why the outE's outVs of a node are the node itself?
The outV
traversal seems broken.
The outV traversal seems broken indeed, I'll look into that. To be specific: the traversal doesn't make sense and should result in a compiler error, rather than a ClassCastException at runtime.
Re outE.outV
- the background is that we have some tinkerpop history, and kept some of the fundamental concepts even after our rewrite.
Here's the traversal in gremlin-groovy:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).outE().outV()
==>v[1]
==>v[1]
==>v[1]
gremlin> g.V(1).outE().inV()
==>v[3]
==>v[2]
==>v[4]
This has caused confusion before I'm happy to rethink that decision, but we need to make sure that we change the terminology at most once. I.e. we need to be 100% sure that we're following the most widely accepted norm. If you can collect a few links/docs where a different standard is used, that'd be very helpful.
Thanks for answering me. @mpollmeier
So if there is an edge a -> b
and I am at a
now, I should use a.outE.inV
to traverse to b
, is this right?
I tried to use
cpg.graph.E
, but I don't know how to filter the REACHING_DEF edges.
I found 'label' to filter the REACHING_DEF edges.
But still, edge travesal in map seems doesn't work:
Is this normal?
So if there is an edge
a -> b
and I am ata
now, I should usea.outE.inV
to traverse tob
, is this right?
This is just because of the impliment of graph traversal standard.
That's correct. This is the tinkerpop naming convention, and I think the rationale is somewhat like "you follow the outgoing edge, to the incoming vertex". You're probably aware already, but to go from a->b
you typically just use a.out
.
Much appreciated!
but to go from
a->b
you typically just usea.out
.
I'll be looking forward to the fixing of outV
traversal and simply use it in the above-mentioned case when it's available.
@NemoTR the fix for cpg.id(14).outV
is now merged. The compiler will now detect that it is an illegal traversal and not let it slip through, therefor avoiding the ClassCastException at runtime.