DataStructures.jl icon indicating copy to clipboard operation
DataStructures.jl copied to clipboard

Splay tree amortized time issues

Open matthewsot opened this issue 8 months ago • 3 comments
trafficstars

Hello,

Thanks for your work on this package !

The documentation at https://juliacollections.github.io/DataStructures.jl/stable/splay_tree/#Splay-Tree-1 indicates the splay tree implementation should guarantee

Operations such as search, insert and delete can be done in O(log n) amortized time, where n is the number of nodes in the SplayTree

But I think this can be violated by the current implementation because it doesn't splay on unsuccessful searches, or on redundant insertions.

Here's a short test script to show the behavior; I have a particularly slow computer so you might need to increase the "10000" to see useful timing info:

using DataStructures

function test(N)
    if true
        t = SplayTree{Int}()
    else
        t = AVLTree{Int}()
    end

    for i in 1:N
        push!(t, i)
    end

    for i in 1:N
        if true
            push!(t, 1)
        else
            push!(t, -i)
        end
    end
end

for i in 1:10
    @time test(i * 10000)
end

In its default configuration, this:

  1. Pushes 1, 2, 3, ..., N, which (correctly) creates a splay tree that's just one long list (root is N, left child N-1, left child N-2, etc.).
  2. Tries to repeatedly re-push 1, which is already in the data structure. This causes it to traverse the entire length-N chain, but since the implementation doesn't splay on redundant pushes no reorganization happens and so this expensive operation happens every single time we search.

The result (at least on my machine) is very poor scaling behavior.

Notably, it is significantly faster if an AVL tree is used or if we push nonredundant elements, so I think the explanation above of the cause of the slowdown is correct. And I think it also affects, e.g., sequences of unsuccessful haskeys (replace push!(t, 1) with haskey(t, 0)).

I think the basic fix is to have search_node always splay the very last node it encounters, as suggested in the splay tree paper. Alternatively, if this is expected behavior, it would be nice IMO to document that. If this package is still maintained + there's interest in either of those two changes I'm happy to send a PR.

Also sorry if I've misunderstood something about the implementation or use of the package; I'm new to Julia.

matthewsot avatar Feb 22 '25 01:02 matthewsot

Just for your information: search, delete, and insert in SortedSet, SortedDict, and SortedMultiDict are all O(log n) operations. However, these data structures do not have the splay-tree property of faster access time to items that are accessed more frequently, so this doesn't directly fix your issue.

StephenVavasis avatar Feb 22 '25 03:02 StephenVavasis

Great catch, btw!

I think the basic fix is to have search_node always splay the very last node it encounters, as suggested in the splay tree paper

Yeah, go ahead and implement this change. It's been a while since I last looked at this implementation in this much depth. Will be happy to review it. Make sure to change the function signature to search_node!(...) since you'll be modifying the structure of the tree.

On a related note, if you have some time on your hand, you can also add a benchmark script for different kind of trees in the benchmark/ folder.

eulerkochy avatar Feb 22 '25 05:02 eulerkochy

Thanks for the fast reply; I'll send something tomorrow!

matthewsot avatar Feb 22 '25 06:02 matthewsot