joern icon indicating copy to clipboard operation
joern copied to clipboard

fix cfgCreator complexity class

Open bbrehm opened this issue 2 years ago • 9 comments

The CfgCreator has a complexity class problem. The general algorithm is very map-reduce style, on top of immutable persistent (copy-on-write) datastructures. Nice, everything is side-effect free!

However The cfgA ++ cfgB needs to combine the edges of both cfgs; these were represented as lists, and combined via edges = this.edges ++ other.edges ++ ... (Cfg.scala, line 59).

Imagine that we have List(Cfg0, Cfg1, ....) each containing a single edge, and then .reduceOption((x, y) => x ++ y). If the underlying implementation reduces from the left, i.e. evaluates ((Cfg0 ++ Cfg1) ++ Cfg2) ++ ..., then we append an element to a list in a loop, which is O(n^2) (each edge needs to be copied n times). Doing the list-append the other way around does not help either -- freedom to reassociate is the point of this way of representing the algorithm.

What we would need to do is

case class Cfg(..., edges: Deque[CfgEdge] = Deque.empty(), ...){

def ++(other: Cfg): Cfg = {
...
val combinedEdges = if(this.edges.size < other.edges.size) this.edges.append(other.edges) else other.edges.prepend(this.edges)
...
}

If we had a scala.collection.immutable.Deque that managed to remove from front/back and append/prepend in amortized O(1), then we would get a nice O(n * log n) runtime for our map-reduce operation.

Proof: For a sequence cfgs = List(Cfg0, Cfg1, ...), define WEIGHT = doneCopies - cfgs.map{c.size}.map{s => s*math.log(s)}.sum, where doneCopies is incremented whenever we append/prepend an edge. Using pen and paper, one can verify that WEIGHT decreases whenever one merges two cfgs. Hence, the final weight is negative, giving the N log N bound.

Alas, scala.collection.immutable.Deque does not exist. Using a modified fancy-pants bitmapped-vector-trie, starting from scala.collection.immutable.Vector, it is probably possible to obtain amortized almost-O(1) (really O(log N), same as Vector) for append/prepend/popFront/popLast. Doing that would be a cool addition to the scala standard library, but that's too much work.

One standard way (as e.g. done when aggregating diffs in parallel passes) is to use a mutable ArrayDeque, and design the code in a way that val cfg = cfg1 ++ cfg2 destroys both cfg1 and cfg2. Scala is not rust, so there is nothing like a borrow-checker to help us manage prevent "moral use-after-free" (instead of corrupting memory, we can take our pick between nullpointer exception or incorrect results).

Another standard way when running on a single thread is to just accumulate into a kinda-global buffer.

Either way, we need to manage mutable state.

This PR uses the kinda-global (local to CfgCreator) approach. To make that work with minimal code changes, Cfg must become an inner class of CfgCreator, which holds an edgeBuffer that is filled as a side-effect of ++.

I admit that this change is not pretty, and I'm happy if @fabsx00 can come up with an alternative.

The PR was motivated by some performance sleuthing for a PoC, cf https://github.com/ShiftLeftSecurity/product/issues/10512

For the larger of the artifacts, the full cpg2sp on my workstation goes from

real	1m24.629s
user	3m39.784s
sys	0m6.683s

down to

real	1m19.193s
user	2m59.596s
sys	0m4.651s

In other words, about 20% of CPU cycles saved end-to-end (CfgCreator consumed a significant fraction of total CPU cycles, and got faster by a significant factor).

bbrehm avatar May 30 '22 23:05 bbrehm

Great catch! It's a pity that there's no scala.collection.immutable.Deque - simply exchanging the collection would be ideal because bringing in a global data structure makes this algorithm more complex. I'm currently using this in teaching and the simplicity of the algorithm is important in that context. As you also point out, it's not the algorithm itself that's introducing the O(n^2) behavior but rather the collection, and more specifically, the ++ operator for lists. Could we possibly use a collection from a well known Java library instead?

fabsx00 avatar May 31 '22 08:05 fabsx00

Afaiu no, there is no well-known java or scala lib. Maybe you're better with google?

I can whip up an immutable accumulator-deque with:

  1. correct complexity class if single-ownership is respected (i.e. if every instance gets used at most once as input to appended / prepended / ++, and one never appends to a deque that is output of popLasts)
  2. Correct albeit slow if single-ownership is not respected
  3. Optionally threadsafe (on java8 using sun.misc.unsafe to get atomics on array elements)

The idea would be to use the usual ring-buffer class MyDeque[+T](offset:Int, len:Int, elems:Array[Object]), with a guard value to denote "next slot is free". Then one can append/prepend by allocating a new MyDeque with adjusted offset/len and compare-and-swapping (atomically via sun.misc.unsafe if necessary) the new element against the guard value and sharing the same elems array. If the compare-and-swap fails, then one copies the relevant slice of the elems array.

This is certainly a fun data structure. However:

  1. Single-ownership violations lead to bad perf and are hard to debug!
  2. If we have single-ownership anyway then we can also use a standard mutable deque and throw a nullpointer exception on violations
  3. If we have single-ownership and have a context where we can stash semi-global state (as in the thing I wrote), then what's the point? Just put everything into the semi-global ArrayBuffer.

Philosophically, I am very opposed to functional-style datastructures in most cases. This leads either to slow-down or to extreme complexity. Just read scala's Vector implementation, and ponder that functional-style changes an algorithm you could do in C into one that almost unavoidably requires a garbage collecting runtime. Especially students must keep all that complexity in their brains all the time! And it is so easy to lose a complexity class by asking a List for its size.

There are cases where functional-style datastructures are warranted, but this should be a very rare spice (e.g. non-overwriting filesystems a la ZFS are a super cool fit for the way SSDs or SMR hard-drives work on the hardware level)

bbrehm avatar May 31 '22 10:05 bbrehm

Doesn't guava have a deque that we can use?

fabsx00 avatar May 31 '22 10:05 fabsx00

Can you link me the thing?

An alternative approach would be to bite the bullet ++ does not preserve the order of edges under re-association. Then we can use either Vector or even keep List, using a combine[T](left:List[T], right:List[T]):List[T] = if(isShorter(left, right)) left ++ right else right ++ left and def isShorter(left:List[_], right: List[T]):Boolean = {var _left = left; var _right = right; while(_left.nonEmpty && _right.nonEmpty){_left = _left.tail; _right = _right.tail}; return _right.nonEmpty}.

This code is silly, but it has the required complexity class (combining a size N and a size M object must scale like O(min(N, M)))

bbrehm avatar May 31 '22 11:05 bbrehm

Sorry for shouting in from the sideline, but how about the default java Deques? https://docs.oracle.com/javase/8/docs/api/java/util/Deque.html

All Known Implementing Classes: ArrayDeque, ConcurrentLinkedDeque, LinkedBlockingDeque, LinkedList

mpollmeier avatar May 31 '22 12:05 mpollmeier

@mpollmeier the general idea of this piece of code is to use immutable persistent (non-overwriting copy-on-write shared memory) datastructures, i.e. to have side-effect-free ++ (like scala List, and unlike scala ArrayBuffer). Using java deque that would require a copy, and that scales O(max(left.size,right.size)) instead of O(min(left.size, right.size)), which leads to the quadratic runtime.

Now we can either find a fancy persistent deque where ++ scales like O(log(max(left.size, right.size)) * min(left.size, right.size))), or we can give up on "side-effect free" or we can give up "associative ++".

If one gives up on side-effect free, then one can either go "all side effects permitted", like the above code, for total O(N); or one can limit side-effects to "single ownership"-style, for O(N*log N); if we had a fancy-pants immutable deque, then one would get something between O(N log N log log N) and O(N l(og N)^2) (would need to sit down to compute that).

If we give up associativity of ++ with respect to edge order, then we get O(N * log N) (and the const factors are not very good).

I don't particularly care about log factors for this code, but quadratic instead of linear is too much.

bbrehm avatar May 31 '22 13:05 bbrehm

Ok makes sense. Can we give up immutability within a small|contained|private area of execution, and return something immutable at the end?

mpollmeier avatar May 31 '22 13:05 mpollmeier

Unless there's a readily available container we can use from a standard library or semi-standard library such as guava, I'd like to table this for now. The gains in performance don't justify spending the time to write our own deque, especially considering that ODB isn't done yet and will bring much more far-reaching performance improvements. The current PR, in my opinion, sacrifices readability and comprehensibility of the code.

fabsx00 avatar May 31 '22 13:05 fabsx00

The tests are still red, btw, so, I think measurements only make sense once they're green.

fabsx00 avatar May 31 '22 13:05 fabsx00