gumtree-spoon-ast-diff icon indicating copy to clipboard operation
gumtree-spoon-ast-diff copied to clipboard

[bug] Potential Incorrect behaviour of ActionClassifier when computing roots actions

Open martinezmatias opened this issue 5 years ago • 12 comments

Summary of the problem

The ActionClassifier (probably, IMO) produces an incorrect outputs when Updates operations are the root operations.

Description of the problem:

There are two mains problems.

First, when we have two operations: 1) update on node X and 2) an Insert of Y on X (i.e., Y is a new child of X), the roots operation only shows 1 action: the update.
IMO, That is incorrect: as we discussed in a previous issue (roots of multiple updates), updates should not be grouped (in this case with Insert)

One example is:

- int xxxxxx1 = foo001(0) ;
+ int xxxxxx1 = foo002(0,1)

Roots is only one: [Update Invocation at X:1 foo001(0) to foo002(0, 1)] All actions are 2: [Update Invocation at X:1 foo001(0) to foo002(0, 1), Insert Literal at X:1 1]

Second related problem. In the previous case, the inserted node (literal 1) is a child of the updated node (foo001). However, if we consider a slightly different example, the ActionClassifier's behaviour is different.

- int xxxxxx1 = foo001(0) ;
+ int xxxxxx2 = foo001(0,1);

Here, we update the var name instead of the invoked method name (and we always insert a new parameter).

The root operation there are 2: Update LocalVariable at X:1 int xxxxxx1 = foo001(0) to int xxxxxx2 = foo001(0, 1) , Insert Literal at X:1 1 ]

(in the previous example the roots is only 1).

The difference of behaviour is caused by in this line : the classifier checks if the parent of an action was updated. In this case, the parent of the Insert was not updated (but the grandparent (xxxxxx1) was).

(I found the problem in a real diff, but it's easier to use a fake example to explain the problem)

Solution proposed

Ignore Updates in Root classification. IMH, Updates are atomic ops that only affect one single node. That would solve the two previous issues. (basically, the fix removes the term && !srcUpdTrees.contains(t.getParent() from this line and from this one)

martinezmatias avatar May 25 '20 06:05 martinezmatias

Actually, one of the problem is when a change (e.g., Insert) also produces that a label of a parent node is updated. For example: test_t_223454

  + ( new File(filePath, fileName), "UTF-8" );
 -             ( new File(filePath, fileName) );

There, the remove of the Literal produces an update of the ConstructorCall (which label is the signature of the executable). In this case, It could make sense to "hide" the Delete i.e., the Delete is not considered as Root. But I am not sure if in all cases.

martinezmatias avatar Jun 03 '20 07:06 martinezmatias

A different case would be if you have an update of Method's name and an insert on its body: There, it will show only one root: the update.

martinezmatias avatar Jun 03 '20 07:06 martinezmatias

Just for context, I'm approaching this from the perspective of applying operations from gumtree-spoon to an AST model (i.e. applying an edit script).

I agree with the idea that updates are atomic operations on labels in GumTree. But when this is translated to Spoon, it's not that simple. If I get an update operation from gumtree-spoon, the only information I get is the source node and destination node, but there's no indication of exactly what was updated as the concept of labels doesn't carry over to Spoon. Given only that information, I think it's confusing to not include update operations in root operations, because from an API perspective an update kind of looks like a root operation, if that makes sense.

A potential solution there would be to also include the role of the item that was updated. That would give sufficient information to allow for finding the updated "thing". For example, given that we update the name of a method, the update operation contains the source and destination nodes, as well as CtRole.NAME. To me, with that information it would make sense to treat update operations as orthogonal to root operations. Any idea if that would be feasible to implement?

slarse avatar Jun 02 '21 12:06 slarse

Hi @slarse

but there's no indication of exactly what was updated as the concept of labels doesn't carry over to Spoon.

Agree. Only CtNamedElements have an equivalent of "label": the simpleName. I also agree that we lost information passing from Spoon to Tree nodes.

I think it's confusing to not include update operations in root operations, because from an API perspective an update kind of looks like a root operation, if that makes sense.

If I am not wrong, the current implement considers all updates as root ops, right?

Any idea if that would be feasible to implement?

Just an idea: What about to only consider updates of CtNamedElement? This would simplify the application of partial diffs into the source tree. (However, I dont have any idea about that to do with updates of VirtualElements i.e., update of modifiers).

martinezmatias avatar Jun 02 '21 13:06 martinezmatias

What about to only consider updates of CtNamedElement?

I think I have a case which will be an exemption to this. Consider an update from int x = a + b to int x = a - b. That should qualify as an update but it won't come under CtNamedElement. So I agree with @slarse that CtRole should also be passed along with src and dst nodes. CtRole.OPERATOR_KIND can used to identify the operator and update it to -.

algomaster99 avatar Jun 02 '21 13:06 algomaster99

I think it's confusing to not include update operations in root operations, because from an API perspective an update kind of looks like a root operation, if that makes sense.

If I am not wrong, the current implement considers all updates as root ops, right?

Yes! Which is also confusing, as we then get root operations on elements that are direct descendants of other root operation elements.

Any idea if that would be feasible to implement?

Just an idea: What about to only consider updates of CtNamedElement? This would simplify the application of partial diffs into the source tree. (However, I dont have any idea about that to do with updates of VirtualElements i.e., update of modifiers).

That simplicity would be very neat from the use case of applying edit scripts, as @algomaster99 suggest I think we lose some amount of precision there.

As for virtual elements, I'm not sure. The fact that modifiers aren't proper nodes is just an eternal bother :(

I'll consider this. I think the only way for all of this to make sense is to, as you suggest, dislodge the concept of update operations from root operations. Perhaps the best way to move forward here is that @algomaster99 and I just try to work out a solution that works for applying edit scripts, and then we'll get back to you with some new thoughts.

slarse avatar Jun 02 '21 13:06 slarse

I had a go at this problem with what both of you suggested. Might be a good thing to go ahead with if we don't have a lot of roles to care about. Let me elaborate in the steps below.

  1. Removed the check in del and add trees (getRootActions) for parent node in update trees.

    - .filter(t -> !srcDelTrees.contains(t.getParent()) && !srcUpdTrees.contains(t.getParent()))
    + .filter(t -> !srcDelTrees.contains(t.getParent()))
    
    - .filter(t -> !dstAddTrees.contains(t.getParent()) && !dstUpdTrees.contains(t.getParent()))
    + .filter(t -> !dstAddTrees.contains(t.getParent()))
    

    This will ensure that insert/delete operation does not get ignored just because they are applied to a node which is a child of an updated node.

  2. Add API for returning CtRole in UpdateOperation

      public UpdateOperation(Update action) {
             super(action);
     	 destElement = (CtElement) action.getNode().getMetadata(SpoonGumTreeBuilder.SPOON_OBJECT_DEST);
    +	 if (destElement instanceof CtBinaryOperator<?>) {
    +	 	role = CtRole.OPERATOR_KIND;
    +	 } else {
    +		role = CtRole.NAME;
    +	 }
     }
    

    role is a private field in the UpdateOperation class.

The problem with this approach is that we will have to analyse many spoon elements for assigning the correct role. On the upside, there may not be a large number of elements that can be updated. Following this approach, we can use the CtRole to get the value from dstNode and set it on the srcNode using getValueByRole and setValueByRole respectively in diffmin. Following is a list of elements I have come across till now.

  • CtLiteral
    - System.out.println(4+8);
    + System.out.println(10000+8);
    
    setValueByRole(CtRole.VALUE, dstNode.getValueByRole(CtRole.VALUE))
  • CtBinaryOperator
    - System.out.println(4-4);
    + System.out.println(4+4); 
    
    setValueByRole(CtRole.OPERATOR_KIND, dstNode.getValueByRole(CtRole.OPERATOR_KIND))
  • CtInvocation
    - System.out.println(1);
    + System.exit(1);
    
    This yields three operations.
    1. Update Invocation
    2. Delete FieldRead
    3. Insert TypeAccess So I am not sure how to go about this because UpdateOperation does not make sense here. But if we really want to use it (for diffs like System.out.println() -> System.out.print()), we could probably update values of CtRole.TARGET and CtRole.EXECUTABLE_REF.
  • CtMethod
    - public void add() { }
    + public void multiply() { }
    
    setValueByRole(CtRole.NAME, dstNode.getValueByRole(CtRole.NAME)) (valid for return types too) I have a feeling that many entities will fall under CtRole.NAME.

Still not quite sure how to handle CtWrapper elements using roles.

algomaster99 avatar Jun 20 '21 22:06 algomaster99

@slarse Did you get time to go through the above comment?

algomaster99 avatar Jul 05 '21 10:07 algomaster99

It seems largely reasonable. What I can add is:

  • With this model, UpdateOperation should not at all be a root operation, so getRootOperations() should not return any update operations
  • Therefore, it would probably be most convenient to add a getUpdateOperations() method

As for the update from System.out.println() to System.exit(), isn't that just the name of the called method that is updated? I.e. println() -> exit()? The deleted field read is probably a deletion of reading System.out, replaced with a type access to System. If those are the operations, it does make sense.

System.out.println(1);
System.out.exit(1); // update println -> exit
exit(1); // delete field read
System.exit(1); // insert type access

?

slarse avatar Jul 05 '21 11:07 slarse

If those are the operations, it does make sense.

It makes sense to me now too. Instead of it making sense, I think I meant it was unnecessary because Delete FieldRead and Insert TypeAccess could have combined into one update operation and we could handle it using CtRole.TARGET.

algomaster99 avatar Jul 05 '21 14:07 algomaster99

If we follow this approach to update spoon nodes, we shall only be updating the value corresponding to a specific role. We would be skipping the update of the source position of the element since we no longer "replace" elements instead we just updated their so-called labels. I am not sure if that would affect diffmin. It would definitely fail ElementOriginTest, but we just need to modify it for update operations. Other than that, I don't think so it would affect the functionality as of now.

algomaster99 avatar Jan 10 '22 10:01 algomaster99

@martinezmatias I needed to confirm what is the expected behavior of the classifier in the examples you have proposed above.

- int xxxxxx1 = foo001(0) ;
+ int xxxxxx1 = foo002(0,1)

Should this have an update operation (foo1 -> foo2) and insertion of literal 1 as root operations?

In more general terms, any operation on any children node of an updated node should not be ignored.

algomaster99 avatar Jan 18 '22 17:01 algomaster99