DataStructures.jl icon indicating copy to clipboard operation
DataStructures.jl copied to clipboard

`PriorityQueue`s iteration slower than necessary

Open BeastyBlacksmith opened this issue 2 years ago • 8 comments

Compare these

julia> using DataStructures, BenchmarkTools

julia> pq = PriorityQueue(1 => 0.2, 2=>0.3)
PriorityQueue{Int64, Float64, Base.Order.ForwardOrdering} with 2 entries:
  1 => 0.2
  2 => 0.3

julia> @btime first(pq)
  119.160 ns (8 allocations: 736 bytes)
1 => 0.2

julia> @btime pq.xs[1]
  29.283 ns (2 allocations: 64 bytes)
1 => 0.2

BeastyBlacksmith avatar Mar 03 '22 15:03 BeastyBlacksmith

It makes a copy of pq at entry to iterate, because otherwise iteration would destroy the object. But that only affects the timing of the first item.

If all you want is the first item, use peek instead.

timholy avatar Mar 04 '22 11:03 timholy

Can this be closed?

timholy avatar Mar 04 '22 13:03 timholy

first is actually fixed on master and peek deprecated, I still think it should be possible to make that iteration without copying, but I also lack concrete suggestions. So we can close this for now.

BeastyBlacksmith avatar Mar 04 '22 13:03 BeastyBlacksmith

It seems rather difficult to iterate over a heap without destroying it, and the result would likely be much less efficient than the destructive algorithm.

timholy avatar Mar 04 '22 13:03 timholy

Is there an easy way to get the unordered iteration? The easiest I could come up with is

julia> f(flag) = ( a= iterate(c_pq, flag);
        while true
        a === nothing && break
        a = iterate(c_pq, a[2])
        end)
f (generic function with 1 method)

julia> @btime f(false)
  9.013 μs (2000 allocations: 62.50 KiB)

julia> @btime f(true)
  390.045 μs (1205 allocations: 118.47 KiB)

BeastyBlacksmith avatar Mar 04 '22 14:03 BeastyBlacksmith

I'm probably missing something here, but even this is sorting after the fact is faster than the current implementation

julia> pq = PriorityQueue(string(Char(i+96)) => rand() for i in 1:10)

julia> @btime sort!([DataStructures.percolate_down!(pq, i) for i in 1:length(pq)], by=x->x.second)
  980.364 ns (16 allocations: 720 bytes)
10-element Vector{Pair{Any, Any}}:
 "j" => 0.24851360627935248
 "f" => 0.25429092151530586
 "e" => 0.2889706212235992
 "a" => 0.3324694434948622
     ⋮
 "d" => 0.4278330560665746
 "i" => 0.5729941482452947
 "c" => 0.719639457382443

julia> @btime [x  for x in pq]
  1.274 μs (9 allocations: 1.05 KiB)
10-element Vector{Pair{Any, Any}}:
 "j" => 0.24851360627935248
 "f" => 0.25429092151530586
 "e" => 0.2889706212235992
 "a" => 0.3324694434948622
     ⋮
 "d" => 0.4278330560665746
 "i" => 0.5729941482452947
 "c" => 0.719639457382443

BeastyBlacksmith avatar Mar 04 '22 14:03 BeastyBlacksmith

Is there an easy way to get the unordered iteration?

I don't know of one, but I haven't looked.

even this is sorting after the fact is faster than the current implementation

I'll reopen, in case you've discovered a faster nondestructive sort.

timholy avatar Mar 04 '22 15:03 timholy

SortedSet, SortedDict, or SortedMultiDict might be a solution (depending on your intended use).

StephenVavasis avatar Mar 04 '22 16:03 StephenVavasis