fastutil
fastutil copied to clipboard
Spliterators for Tree data structures
The current pull request to add spliterator methods to collection types omits optimized Spliterator implementations for the tree data structures, due to complexity of "splitting" a tree data structure (not a true split operation, since it still is one tree. This is just splitting a view of the tree)
This bug tracks adding Spliterator implementations for these data structures.
What I am thinking currently is to make the spliterator go over the elements like the iterator, but keep a reference to the "right most seen but not traversed" node. When constructed, the current nodes is set to firstEntry
but the "right most seen but not traversed" (here on referred to as rightmost
) is set to root.right()
(though some fallback logic will be needed if that is null
). As the traversal happens, if a node even "further right" then rightmost
is encountered (which usually happens when traversing that node), then that reference is updated to point to that new node.
Then when trySplit() is called, the newly returned spliterator (the prefix) has the current spliterators current node, the right most pointer set (some undermined algorithm), and a fence telling it not to traverse past the split point. Then the current iterator, the suffix, "jumps" to the rightmost
entry, and that reference is reinitialized to rightmost.right()
entry (though again with some sort of fallback if that is null
). (This is to conform to Java's spec that the returned spliterator is a prefix, and the current spliterator becomes a suffix)
Assuming a mostly balanced tree, this should result in a mostly even split.
Does this rough algorithm sound correct to you? Aside from the big glaring unknown about how to handle the prefix's rightmost
determination.
But, the trees are threaded. The iterator uses threads. I think you are referring to a recursive visiting iterator. Am I wrong?
The idea is that a spliterator needs to be able to split efficiently but still maintain sorted order.
After a trySplit
, the current spliterator becomes the suffix. The returned iterator becomes the prefix (the first element returned by the returned spliterator must be what would have been returned by the current iterator had it not split).
This means I need some way to find a node somewhere further along efficiently (more towards the middle of the current and the end is better, but not required) as well as some way to designate a maximum element to return (which iterator does not have to do since it always goes over everything). I couldn't figure out the iteration algorithm to understand how to accomplish all this. Thus for now it is just inheriting the iterator based spliterator from SortedSet interface. Not bad for single threaded traversal but of limited use for parallelism.
I didn't want to just rip off what OpenJDK does for TreeMap because, well, ripping off code that directly is a bad idea. ;)
Mmmmhhhh... it seems a pretty trivial recurive subdivision. Am I missing something?
It probably is, I just couldn't figure it out. :embarrassed:
I'll have a look. But this is something to be done in the long run.