DataStructures.jl
DataStructures.jl copied to clipboard
`PriorityQueue`s iteration slower than necessary
Compare these
julia> using DataStructures, BenchmarkTools
julia> pq = PriorityQueue(1 => 0.2, 2=>0.3)
PriorityQueue{Int64, Float64, Base.Order.ForwardOrdering} with 2 entries:
1 => 0.2
2 => 0.3
julia> @btime first(pq)
119.160 ns (8 allocations: 736 bytes)
1 => 0.2
julia> @btime pq.xs[1]
29.283 ns (2 allocations: 64 bytes)
1 => 0.2
It makes a copy of pq
at entry to iterate
, because otherwise iteration would destroy the object. But that only affects the timing of the first item.
If all you want is the first item, use peek
instead.
Can this be closed?
first
is actually fixed on master and peek
deprecated, I still think it should be possible to make that iteration without copying, but I also lack concrete suggestions. So we can close this for now.
It seems rather difficult to iterate over a heap without destroying it, and the result would likely be much less efficient than the destructive algorithm.
Is there an easy way to get the unordered iteration? The easiest I could come up with is
julia> f(flag) = ( a= iterate(c_pq, flag);
while true
a === nothing && break
a = iterate(c_pq, a[2])
end)
f (generic function with 1 method)
julia> @btime f(false)
9.013 μs (2000 allocations: 62.50 KiB)
julia> @btime f(true)
390.045 μs (1205 allocations: 118.47 KiB)
I'm probably missing something here, but even this is sorting after the fact is faster than the current implementation
julia> pq = PriorityQueue(string(Char(i+96)) => rand() for i in 1:10)
julia> @btime sort!([DataStructures.percolate_down!(pq, i) for i in 1:length(pq)], by=x->x.second)
980.364 ns (16 allocations: 720 bytes)
10-element Vector{Pair{Any, Any}}:
"j" => 0.24851360627935248
"f" => 0.25429092151530586
"e" => 0.2889706212235992
"a" => 0.3324694434948622
⋮
"d" => 0.4278330560665746
"i" => 0.5729941482452947
"c" => 0.719639457382443
julia> @btime [x for x in pq]
1.274 μs (9 allocations: 1.05 KiB)
10-element Vector{Pair{Any, Any}}:
"j" => 0.24851360627935248
"f" => 0.25429092151530586
"e" => 0.2889706212235992
"a" => 0.3324694434948622
⋮
"d" => 0.4278330560665746
"i" => 0.5729941482452947
"c" => 0.719639457382443
Is there an easy way to get the unordered iteration?
I don't know of one, but I haven't looked.
even this is sorting after the fact is faster than the current implementation
I'll reopen, in case you've discovered a faster nondestructive sort.
SortedSet, SortedDict, or SortedMultiDict might be a solution (depending on your intended use).