joern (C/C++) A design mistake makes it hard to recognize which argument makes CALL node connected by the REACHING

(C/C++) A design mistake makes it hard to recognize which argument makes CALL node connected by the REACHING_DEF edge

Open NemoTR opened this issue 2 years ago • 12 comments

Here is the code:

int f(int a, int b, int c)
{
    return g(a, b + 1, c);
}

And its ast is like: Its ddg is like:

Firstly, there should be an REACHING_DEF edge from b + 1 to g(a, b + 1, c), but there isn't. This should be fixed.

By traversing the sub-AST of CALL node, I can know that it has 3 arguments (a,b + 1,c), the second one is of CALL type, while the others are of IDENTIFIER type.

Here is the attribute of the 3 nodes:

{
        "dynamicTypeHintFullName":[
          
        ],
        "name":"a",
        "code":"a",
        "typeFullName":"int",
        "columnNumber":14,
        "order":1,
        "_label":"IDENTIFIER",
        "argumentIndex":1,
        "lineNumber":63,
        "id":27
      },
      {
        "name":"<operator>.addition",
        "signature":"",
        "code":"b + 1",
        "typeFullName":"<empty>",
        "columnNumber":17,
        "order":2,
        "methodFullName":"<operator>.addition",
        "_label":"CALL",
        "argumentIndex":2,
        "dynamicTypeHintFullName":[
          
        ],
        "dispatchType":"STATIC_DISPATCH",
        "lineNumber":63,
        "id":28
      },
      {
        "dynamicTypeHintFullName":[
          
        ],
        "name":"c",
        "code":"c",
        "typeFullName":"int",
        "columnNumber":24,
        "order":3,
        "_label":"IDENTIFIER",
        "argumentIndex":3,
        "lineNumber":63,
        "id":31
      },

I can easily get the REACHING_DEF edge related to CALL node b + 1, but for a and c, it is hard to distinguish them. Maybe I am interested in argument a, but I can hardly know if there is a REACHING_DEF edge links to the CALL node because of a. It is able to solve the problem by using the information on REACHING_DEF edge, but in my case, I need to simplify the ddg so I don't want to preserve the information on edges at all.

Here is a better solution for that:

IDENTIFIER nodes don't exist in ddg, I think it is necessary to add two nodes respectively for a and b, the two nodes are of CALL type. Its method is a new one, representing using an identifier itself. These two nodes are in ddg now, and they can help me to get relationship between REACHING_DEF edges and CALL node arguments.

May 08 '22 12:05 NemoTR

I found another way to solve this problem, we can add a new type of node named "ARGUMENT", every argument of "CALL" node has its corresponding "ARGUMENT" node, This kind of node will show in both AST and DDG. However, this may add too many nodes.

May 08 '22 17:05 NemoTR

The two ways may need to make big changes, an easier way to solve this problem is to let "IDENTIFIER" node exist in DDG. In this case, just link (PARAM, a) to (IDENTIFIER, a) and link (IDENTIFIER, a) to (CALL, g(a, b + 1, c) ), not to link (PARAM, a) to (CALL, g(a, b + 1, c) ) directly. But this may break the consistance of ddg generating rules.

May 08 '22 17:05 NemoTR

I think this is just an effect that is produced by the dot-plotting code, which aims to simplify the representation. In the underlying graph, we in fact create REACHING_DEF edges between arguments, so, I don't think you're seeing a design issue here. I'll look into the potentially missing reaching-def edge though.

May 08 '22 18:05 fabsx00

I think this is just an effect that is produced by the dot-plotting code, which aims to simplify the representation. In the underlying graph, we in fact create REACHING_DEF edges between arguments, so, I don't think you're seeing a design issue here. I'll look into the potentially missing reaching-def edge though.

Could you please teach me how to obtain the underlying DDG?

I tried to use cpg.graph.E, but I don't know how to filter the REACHING_DEF edges.

And I found that cpg.graph.E.map(e => List(e.inV, e.outV)) did not work:

May 09 '22 03:05 NemoTR

So weired.

Are these output wrong? Why the outE's outVs of a node are the node itself?

May 09 '22 04:05 NemoTR

The outV traversal seems broken.

May 09 '22 04:05 NemoTR

The outV traversal seems broken indeed, I'll look into that. To be specific: the traversal doesn't make sense and should result in a compiler error, rather than a ClassCastException at runtime.

Re outE.outV - the background is that we have some tinkerpop history, and kept some of the fundamental concepts even after our rewrite.

Here's the traversal in gremlin-groovy:

gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).outE().outV()
==>v[1]
==>v[1]
==>v[1]
gremlin> g.V(1).outE().inV()
==>v[3]
==>v[2]
==>v[4]

This has caused confusion before I'm happy to rethink that decision, but we need to make sure that we change the terminology at most once. I.e. we need to be 100% sure that we're following the most widely accepted norm. If you can collect a few links/docs where a different standard is used, that'd be very helpful.

May 09 '22 08:05 mpollmeier

Thanks for answering me. @mpollmeier

So if there is an edge a -> b and I am at a now, I should use a.outE.inV to traverse to b , is this right?

I tried to use cpg.graph.E, but I don't know how to filter the REACHING_DEF edges.

I found 'label' to filter the REACHING_DEF edges.

But still, edge travesal in map seems doesn't work:

Is this normal?

May 09 '22 10:05 NemoTR

So if there is an edge a -> b and I am at a now, I should use a.outE.inV to traverse to b , is this right?

This is just because of the impliment of graph traversal standard.

May 09 '22 11:05 NemoTR

That's correct. This is the tinkerpop naming convention, and I think the rationale is somewhat like "you follow the outgoing edge, to the incoming vertex". You're probably aware already, but to go from a->b you typically just use a.out.

May 09 '22 15:05 mpollmeier

Much appreciated!

but to go from a->b you typically just use a.out.

I'll be looking forward to the fixing of outV traversal and simply use it in the above-mentioned case when it's available.

May 09 '22 15:05 NemoTR

@NemoTR the fix for cpg.id(14).outV is now merged. The compiler will now detect that it is an illegal traversal and not let it slip through, therefor avoiding the ClassCastException at runtime.

May 27 '22 13:05 mpollmeier

joern joern copied to clipboard

(C/C++) A design mistake makes it hard to recognize which argument makes CALL node connected by the REACHING_DEF edge

joern
joern copied to clipboard