kotlinx.collections.immutable icon indicating copy to clipboard operation
kotlinx.collections.immutable copied to clipboard

Iterable.intersect is very slow with PersistentList

Open yuriykulikov opened this issue 5 years ago • 1 comments
trafficstars

Iterable<T>.intersect(other: Iterable<T>) takes a very long time to complete when called with a PersistentList as a parameter. Same function works faster with other iterables like List and Set. It is minutes with PersistentList and milliseconds with List.

I couldn't find the exact reason for that, but it seems that Collection.retainAll does something with the persistent list which takes ages to complete.

Here are some examples:

    (0..147853).toList().intersect((0..147853).toList()) // takes milliseconds
    (0..147853).toList().intersect((0..147853).toPersistentList()) // takes minutes
    (0..147853).toList().intersect((0..147853).toPersistentList().toSet()) // takes milliseconds

    (0..147853).toMutableList().retainAll((0..147853).toPersistentList()) // takes minutes
    (0..147853).toMutableList().retainAll((0..147853).toPersistentList().toList()) // takes milliseconds

yuriykulikov avatar Dec 04 '19 09:12 yuriykulikov

Hello,

retainAll calls Collection.contains(). The complexity of contains() is O(1) or O(logN) for sets and O(n) for list.

So, to be honest:

  • I was surprised that (0..147853).toList().intersect((0..147853).toList()) takes only milliseconds

  • I was not surprised that (0..147853).toList().intersect((0..147853).toPersistentList()) takes minutes.

But the implementation of MutableCollection.retainAll(elements: Iterable) tries to be smart: in some cases, 'elements' is converted to a set and retainAll is applied using this set. It explains why the test case with two lists is so fast.

This behavior is handled by the following code from Iterables.kt

/** Returns true when it's safe to convert this collection to a set without changing contains method behavior. */
private fun <T> Collection<T>.safeToConvertToSet() = size > 2 && this is ArrayList

/** Converts this collection to a set, when it's worth so and it doesn't change contains method behavior. */
internal fun <T> Iterable<T>.convertToSetForSetOperationWith(source: Iterable<T>): Collection<T> =
    when (this) {
        is Set -> this
        is Collection ->
            when {
                source is Collection && source.size < 2 -> this
                else -> if (this.safeToConvertToSet()) toHashSet() else this
            }
        else -> toHashSet()
    }

When 'this' is a persistent list, it is a collection but not an array list, so safeToConvertToSet() returns false and we don't do the conversion to hash set.

This is only an analysis, I don't have any solution for now.

GuillaumeEveillard avatar Mar 26 '20 10:03 GuillaumeEveillard