multisets icon indicating copy to clipboard operation
multisets copied to clipboard

immutable.Bag.++ takes time linear in the size of both collections.

Open Blaisorblade opened this issue 10 years ago • 0 comments

While reading your code, I noticed one thing which might explain bad performance I'm seeing in some scenarios.

immutable.Bag.++ creates a completely new bag, which is very expensive. Instead, I expect it should reuse the ++ method of HashTries, which will share unchanged nodes of the trie. The implementation is in scala.collection.BagLike.++, and not overriden in scala.collection.immutable.BagLike.++. Indeed, the implementation of immutable.Bag.++ appeared to give correct results even in presence of issue #3.

In particular, this means that the cost of ++ is linear in the size of both arguments: adding 1 element to a bag with N elements takes Θ(N) time. Thus, building a bag of N elements by appending one element at a time takes time Θ(1 + 2 + 3 + ... + N) = Θ(N^2).

Blaisorblade avatar May 26 '15 23:05 Blaisorblade