codepropertygraph New traversal proposal.

This is my proposal for a new generic traversal supposed to be used as replacement for our current Traversal class to build up our DSL steps. This proposal offers three advantages when compared to the current state:

DSL steps only need to be defined once to be usable on single elements as well on collections/traversals of elements and it does so without wrapping single elements in some kind of single element collections.
It is equally fast or faster when using C2 JIT (the default one on JVM 8 and 11) and it is much faster when using Graal JIT (can be enable in JVM 11 via flags or by using GraalVM). In fact for the performance tests i made so far the new DSL with Graal JIT runs as fast as using no traversal wrapper at all.
DSL step definitions can be used for lazy iterations via Iterator as well as non lazy iterations via Iterable just buy including a different implicit PipeOps iterator.

Before i go into some of the details i dont want to spare you the drawbacks:

People writing DSL steps will be exposed to more complex generic programming than it is currently the case. E.g. higher kinded generic type parameters are used. I have come up with a strict schema for the DSL steps in order to keep complexity low.
We wont be able to swap in the new traversal without some stylistic changes in the usage pattern of steps which take implicit parameters: E.g. repeat(...)(_.emit) would need to become repeat(..., _.emit). With scala 3 we would not need to make that change.

I will now provide an example of the new DSL step definitions for the referencingIdentifiers step on LOCAL nodes:

// Implicit function which wraps a single LOCAL node into a LocalTraversalNew.
// This is quite analog to the current DSL and goes into some language package object.
// FT[_] will be explained futher below.
implicit def toLocalTraversalNew1[I <: nodes.Local, FT[_]](trav: I): LocalTraversalNew[I, Trav1, FT] = {
  new LocalTraversalNew(trav: Trav1[I])
}

// Implicit function which wraps an input traversal(IT) of LOCAL nodes into a LocalTraversalNew. 
// This is quite analog to the current DSL and goes into some language package object.
// FT[_] will be explained futher below.
implicit def toLocalTraversalNew2[I <: nodes.Local, IT[_], FT[_]](trav: IT[I]): LocalTraversalNew[I, IT, FT] = {
  new LocalTraversalNew(trav)
}

// Extension class which works on any input traversal(IT) of LOCAL nodes.
class LocalTraversalNew[I <: nodes.Local, IT[_], FT[_]](val pipe: IT[I]) extends AnyVal {
  // Extension method takes implicit traversal operation ops1 which provides the basic traversal operations
  // like map and flatMap for the input traversal type (IT). Some of the basic traversal operations like flatMap
  // have an output traversal type which is different from IT. We call it flatMap traversal(FT).
  // The seconds traversal operation ops2 is than required to again provide the basic traversal operations
  // for the result of flatMap.
  def referencingIdentifiers(implicit ops1: TravOps[IT, FT], ops2: TravOps[FT, FT]) = {
    pipe
      .flatMap(_._refIn.asScala)
      .filter(_.label == NodeTypes.IDENTIFIER)
      .cast[Identifier]
  }
}

Keep in mind this PR is just a proposal so it contains some stuff off less importance. The new traversal core definitions are under semanticcpg/src/main/scala/io/shiftleft/semanticcpg/langv2/ and MethodTraversal.scala, AstNodeTraversal.scala and LocalTraversal.scala contain some example step implementations. Please ignore semanticcpg/src/main/scala/io/shiftleft/semanticcpg/language/MySteps.scala and semanticcpg/src/main/scala/io/shiftleft/semanticcpg/language/PipeOps.scala. They contain my first attempt for a new traversal but that approach while easier for the step writer turned out to be too slow.

Nov 11 '21 15:11 ml86

From your description, the advantages clearly outweight the disadvantages. The one thing I ask myself when seeing the example code is whether we can find any ways of making the complexity less visible to the language writer. @mpollmeier what are your thoughts?

Nov 11 '21 17:11 fabsx00

Just discussed this in detail with markus, and he's looking into a few additional points (e.g. support for PathAwareTraversal and .repeat()(_.until) with type application). We'll let it sink in over the weekend.

Additional benefit that's not in the description yet: the end user (not the language designer) doesn't need to learn about Traversal - she can just continue to use Iterator or whatever collection they have at hand.

Additional downside: when the language designer adds a non-trivial step (like in LocalTraversalNew.referencingIdentifiers), she only has the methods that we defined in AnyTraversal at hand. Currently, with Traversal, we have all collection methods, including things like zipWithIndex, partition etc. We'd need to make sure we cover enough ground here.

We also discussed and agreed that it makes sense to bring this in before upgrading to Scala 3, mostly because changing the Scala version and traversal impl wold mean a hard cut for all existing Scala 2 users.

It'll be crucial to document this really really well, especially those additional generic terms like InputTraversal(IT) / FlatMapTraversal(IT) and TravNOps.

Overall I'm positive for this PR. The complexity certainly goes up, but the benefits seem to outweigh the downsides.

Nov 12 '21 09:11 mpollmeier

codepropertygraph codepropertygraph copied to clipboard

New traversal proposal.

codepropertygraph
codepropertygraph copied to clipboard