gods icon indicating copy to clipboard operation
gods copied to clipboard

Speed up removal from a set

Open FrnchFrgg opened this issue 4 years ago • 13 comments

Introduce a new Filter() command to doubly linked lists that can perform several removals in a single O(n) sweep, and use that new method to make set removal O(n) instead of O(n**2)

FrnchFrgg avatar Nov 22 '19 22:11 FrnchFrgg

All tests pass and I introduced a new test for the Filter() method.

FrnchFrgg avatar Nov 22 '19 22:11 FrnchFrgg

@FrnchFrgg

how is the current implementation O(n**2) as I only see a single loop in https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104

also check out the enumerable functions found in also O(n) AFAICS https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L33

emirpasic avatar Mar 31 '20 19:03 emirpasic

Indeed, I didn't see the Select method in enumerable; it can do the same as Filter (except that my proposed Filter does it in place and thus does not reallocate anything, whereas Select creates a new list).

As for the current implementation, removing a single element in a doubly linked list is O(n). But the API to remove elements from a set accepts multiple elements, and removes each one separately, in a loop. In effect, in means the complexity of set removal is O(m × n) where m is the amount of items to remove and n the set size. Worse comes to worst, m = n/2 and we get O(n**2).

My patch series make that removal O(n) regardless on the amount of elements to remove. It could build up on Select instead, but that would mean the set would reallocate its internal list for each removal, which is wasteful.

FrnchFrgg avatar Mar 31 '20 20:03 FrnchFrgg

@FrnchFrgg

It is common that these filter/select/remove/map (enumerable) functions return a new collection, AFAIK at least in Ruby/Java/Javascript, etc.

You could still use the current Select function to have O(n) removal for multiple indexes given the condition function evaluates this, right? It would create a new collection, but it'd run in O(n).

Now to think about it, I am not sure why I haven't made the Remove(i) a variadic argument function so that it can accept multiple indexes at once Remove(1,3,4,5, ...) etc.. Implementing that wouldn't even be a breaking change :)

ref. https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/doublylinkedlist.go#L104

emirpasic avatar Mar 31 '20 20:03 emirpasic

...really strange as most of the other functions have variadic arguments :/

emirpasic avatar Mar 31 '20 20:03 emirpasic

Let me reopen this PR to have a reminder to have that implemented for all lists, I would start making this a variadic function and applying it to all three list sub-types: https://github.com/emirpasic/gods/blob/master/lists/lists.go#L20

emirpasic avatar Mar 31 '20 20:03 emirpasic

You'd have to first loop through the list to find out (and remember in a toRemove list) all indices of elements that need removal, to then call the remove function with "toRemove...". A lot of allocations that can be avoided. But making remove accept variadic arguments is a good idea nevertheless.

I don't think using Select is a very good idea: the common case is probably removing a single item from the set, and having the set re-create a new list each time is very wasteful.

I agree that Select is good API in general, especially in go where append, etc. return the new slice, but in that case it is not optimal to use it.

FrnchFrgg avatar Mar 31 '20 20:03 FrnchFrgg

The idea behind "Filter" is that it was not index-based, so it would sidestep this issue. With the updated set, checking if an element needs to be filtered out of the doubly linked list is O(1), given its value.

FrnchFrgg avatar Mar 31 '20 20:03 FrnchFrgg

@FrnchFrgg so same can be achieved with the Select function, right? ...just not in place.

emirpasic avatar Mar 31 '20 21:03 emirpasic

Indeed, the same can be achieved with the Select function. At the very least, there should be a check for the number of items to remove from the set, to avoid doing a complete reallocation for the common case.

The better way would be to change the "Filter" I introduced into an in-place "Select", i.e. reuse the exact same signature, and make it equivalent to list = list.Select(...). In most List structures it would be possible to do so without reallocation, but at worst harnessing the existing Select gives an easy implementation.

FrnchFrgg avatar Mar 31 '20 21:03 FrnchFrgg

Oh I just found out that Select is not yet part of the EnumerableWithIndex interface, it is commented out.

FrnchFrgg avatar Mar 31 '20 21:03 FrnchFrgg

@FrnchFrgg we are talking about the doubly linked list right? I think it should be, this guarantees it: https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L10

And the select is right here: https://github.com/emirpasic/gods/blob/master/lists/doublylinkedlist/enumerable.go#L33

`

emirpasic avatar Apr 01 '20 07:04 emirpasic

So it's implemented, but I could not enforce it here (some technical issue that I could not resolve without type checking, which I didn't want to do):

https://github.com/emirpasic/gods/blob/master/containers/enumerable.go#L19

In other words you are right, not part of the EnumerableWithIndex, but you can still use Select for lists

emirpasic avatar Apr 01 '20 07:04 emirpasic