DataStructures.jl `PriorityQueue`s iteration slower than necessary

Compare these

julia> using DataStructures, BenchmarkTools

julia> pq = PriorityQueue(1 => 0.2, 2=>0.3)
PriorityQueue{Int64, Float64, Base.Order.ForwardOrdering} with 2 entries:
  1 => 0.2
  2 => 0.3

julia> @btime first(pq)
  119.160 ns (8 allocations: 736 bytes)
1 => 0.2

julia> @btime pq.xs[1]
  29.283 ns (2 allocations: 64 bytes)
1 => 0.2

Mar 03 '22 15:03 BeastyBlacksmith

It makes a copy of pq at entry to iterate, because otherwise iteration would destroy the object. But that only affects the timing of the first item.

If all you want is the first item, use peek instead.

Mar 04 '22 11:03 timholy

Can this be closed?

Mar 04 '22 13:03 timholy

first is actually fixed on master and peek deprecated, I still think it should be possible to make that iteration without copying, but I also lack concrete suggestions. So we can close this for now.

Mar 04 '22 13:03 BeastyBlacksmith

It seems rather difficult to iterate over a heap without destroying it, and the result would likely be much less efficient than the destructive algorithm.

Mar 04 '22 13:03 timholy

Is there an easy way to get the unordered iteration? The easiest I could come up with is

julia> f(flag) = ( a= iterate(c_pq, flag);
        while true
        a === nothing && break
        a = iterate(c_pq, a[2])
        end)
f (generic function with 1 method)

julia> @btime f(false)
  9.013 μs (2000 allocations: 62.50 KiB)

julia> @btime f(true)
  390.045 μs (1205 allocations: 118.47 KiB)

Mar 04 '22 14:03 BeastyBlacksmith

I'm probably missing something here, but even this is sorting after the fact is faster than the current implementation

julia> pq = PriorityQueue(string(Char(i+96)) => rand() for i in 1:10)

julia> @btime sort!([DataStructures.percolate_down!(pq, i) for i in 1:length(pq)], by=x->x.second)
  980.364 ns (16 allocations: 720 bytes)
10-element Vector{Pair{Any, Any}}:
 "j" => 0.24851360627935248
 "f" => 0.25429092151530586
 "e" => 0.2889706212235992
 "a" => 0.3324694434948622
     ⋮
 "d" => 0.4278330560665746
 "i" => 0.5729941482452947
 "c" => 0.719639457382443

julia> @btime [x  for x in pq]
  1.274 μs (9 allocations: 1.05 KiB)
10-element Vector{Pair{Any, Any}}:
 "j" => 0.24851360627935248
 "f" => 0.25429092151530586
 "e" => 0.2889706212235992
 "a" => 0.3324694434948622
     ⋮
 "d" => 0.4278330560665746
 "i" => 0.5729941482452947
 "c" => 0.719639457382443

Mar 04 '22 14:03 BeastyBlacksmith

Is there an easy way to get the unordered iteration?

I don't know of one, but I haven't looked.

even this is sorting after the fact is faster than the current implementation

I'll reopen, in case you've discovered a faster nondestructive sort.

Mar 04 '22 15:03 timholy

SortedSet, SortedDict, or SortedMultiDict might be a solution (depending on your intended use).

Mar 04 '22 16:03 StephenVavasis

DataStructures.jl DataStructures.jl copied to clipboard

`PriorityQueue`s iteration slower than necessary

DataStructures.jl
DataStructures.jl copied to clipboard