kotlinx.collections.immutable icon indicating copy to clipboard operation
kotlinx.collections.immutable copied to clipboard

Questionable PersistentList time complexity specification

Open mcpiroman opened this issue 2 years ago • 2 comments

As far as I see the current implementation of collections, specifically PersistentList, is that of described in the proposal, which said that the complexity of PersistentList is:

  • get(index), set(index, element) - O(log32N), where N is the size of the instance the operations are applied on.
  • add(index, element), removeAt(index), remove(element) - O(N).
    • add(element) and removeAt(size - 1) are O(1)
  • addAll(elements) - O(N).
  • addAll(index, elements), removeAll(elements), removeAll(predicate) - O(N M), optimizable to O(N+M), where M is the number of elements to be inserted/removed.
  • Iterating elements - O(N).

To my understanding, an implementation by simple delegation to an array that is copied on each operation would have the following:

  • 🟩 get(index), set(index, element) - O(1)
  • 🟨 add(index, element), removeAt(index), remove(element) - O(N).
    • 🟥 add(element) and removeAt(size - 1) - O(N)
  • 🟥 addAll(elements) - O(N + M)
  • 🟨/🟩 addAll(index, elements), removeAll(elements), removeAll(predicate) - O(N+M)
  • 🟨 Iterating elements - O(N).

This is, the current tire-based approach is only better at appending elements, which is very common operation for sure, but otherwise is either worse or unnecessary complicated compared to naive coping.

Do I understand it correctly? I assume the actual performance, including other aspects, like GC-pressure or data locality, does overweight raw complexity metrics, am I right?

mcpiroman avatar Mar 24 '22 21:03 mcpiroman

OK, I have read more about bit-mapped trie structure and now I understand that it is, generally speaking, optimal. (Funny enough it appears that I have invented this struct myself, before I read about it).

I suggest (better) documenting the current implementations' performance characteristic, e.g. that O(log32N) is in practice very little, but middle insert and remove are actually worse than O(N) - so much that it's the reason why .NET initially rejected moving from AVL tree to tire implementation.

For the letter I have some ideas how to improve the implementation, however I have a question: Why are all buffers always preallocated with 32 size? My only guess is that it helps runtime reuse the memory region in case it can prove that some buffer is not referenced anymore. Would it be a problem if I sometimes used 33 element buffers?

mcpiroman avatar Mar 28 '22 12:03 mcpiroman

set(index, element) is O(N) in case you copy the array on modifications, so it is way worse than the trie

vlsi avatar Jan 14 '23 04:01 vlsi