kotlinx.collections.immutable
kotlinx.collections.immutable copied to clipboard
Questionable PersistentList time complexity specification
As far as I see the current implementation of collections, specifically PersistentList
, is that of described in the proposal, which said that the complexity of PersistentList
is:
-
get(index)
,set(index, element)
- O(log32N), where N is the size of the instance the operations are applied on. -
add(index, element)
,removeAt(index)
,remove(element)
- O(N).-
add(element)
andremoveAt(size - 1)
are O(1)
-
-
addAll(elements)
- O(N). -
addAll(index, elements)
,removeAll(elements)
,removeAll(predicate)
- O(N M), optimizable to O(N+M), where M is the number of elements to be inserted/removed. - Iterating elements - O(N).
To my understanding, an implementation by simple delegation to an array that is copied on each operation would have the following:
- 🟩
get(index)
,set(index, element)
- O(1) - 🟨
add(index, element)
,removeAt(index)
,remove(element)
- O(N).- 🟥
add(element)
andremoveAt(size - 1)
- O(N)
- 🟥
- 🟥
addAll(elements)
- O(N + M) - 🟨/🟩
addAll(index, elements)
,removeAll(elements)
,removeAll(predicate)
- O(N+M) - 🟨 Iterating elements - O(N).
This is, the current tire-based approach is only better at appending elements, which is very common operation for sure, but otherwise is either worse or unnecessary complicated compared to naive coping.
Do I understand it correctly? I assume the actual performance, including other aspects, like GC-pressure or data locality, does overweight raw complexity metrics, am I right?
OK, I have read more about bit-mapped trie structure and now I understand that it is, generally speaking, optimal. (Funny enough it appears that I have invented this struct myself, before I read about it).
I suggest (better) documenting the current implementations' performance characteristic, e.g. that O(log32N)
is in practice very little, but middle insert and remove are actually worse than O(N)
- so much that it's the reason why .NET initially rejected moving from AVL tree to tire implementation.
For the letter I have some ideas how to improve the implementation, however I have a question: Why are all buffers always preallocated with 32 size? My only guess is that it helps runtime reuse the memory region in case it can prove that some buffer is not referenced anymore. Would it be a problem if I sometimes used 33 element buffers?
set(index, element)
is O(N) in case you copy the array on modifications, so it is way worse than the trie