swift-collections
swift-collections copied to clipboard
Priority queue
Priority queues are commonly requested data structures, and are universally useful. A high-quality implementation would be a natural candidate for inclusion in this package. (Priority queues are used in sorting algorithms, various graph algorithms, networking protocol implementations, etc. etc. etc.)
I expect a straightforward array-backed binary heap might make most sense for the initial implementation, but I wouldn't be surprised if a fancier variant proved better overall.
struct Heap<Element: Comparable>
This would likely be mostly an implementation exercise rather than an API design challenge (although API design is never trivial). We'd probably need to experiment with several different data structure implementations, and perhaps even come up with our own variants that's better suited to Swift's unique combination of language features.
This would be super useful for my current use case! Sorry if this is a beginner question here, but I was wondering, what advantage would this offer over using an OrderedSet and sorting it by the priority in a model (as in, I have a model with a priority property and I sort the OrderedSet by that property)? (Also, would the pqueue auto sort upon insertion?)
@julianschiavo It takes less computing while you read the biggest few elements. It is more effective when you only use some of the biggest elements but not all collection. It's a queue
but not a random access collection
.
what advantage would this offer over using an OrderedSet and sorting it by the priority in a model
A priority queue is not a set
, it may allows duplicated elements (depends on implementation).
Also, would the pqueue auto sort upon insertion?
A heap is not stored as sorted sequence, it only guarantees the top elements are bigger or smaller than bottom elements.
Heap is a complete binary tree and could be stored in a continuous array. Index n
's parent is n/2
and it's children are n*2 + 1
and n*2 + 2
. The heap:
10
9 3
8 7 _ _
It can be stored as array:
10 9 3 8 7 _ _
It is partial sorted bu not full sorted upon insertion or deletion. So it take less time to insert or delete an element.
-
Insertion: Add a new element at the tail of the array, and
pop
the element to a proper position. -
Deletion: You can only remove the top element. Every removed element should replaced by the biggest direct child.
Both operation takes no more than logN times swap operation.
Got it, thank you so much for your in-depth explanation, really helps with understanding the concept! That's exactly what I'm looking for if I have that right 😄
So to make sure I understand, the idea behind "it only guarantees the top elements are bigger [...]" is that as long as I dequeue from the top (in a PQ of highest priority at the top), each element I dequeue will have the top priority in all the elements? (I understand from your explanation this continues to be the case if I inserted new elements—I guess that's kind of what I meant by sorting on insertion, though I know now it's not technically doing that)
Yes.
See the heap in my reply. If you take 10
, 9
will popped to the top, and 8
will be popped up. For 8
has no child and it's not the last element, 7
will be put into the slot. So the resorted heap is:
9
8 3
7 _ _ _
If you dequeue all elements one by one, they would be ordered by priority.
👍 Thank you!
I've tried to implement this, and it got me thinking. Firstly, Comparable
implies strict order, and that's probably not expected in priority queues. Consider usage:
enum Priority {
case high
case low
}
struct Work {
var priority: Priority
var work: () -> ()
}
And if some would want to use Work
in priority queue, it will require them to implement Comparable
for Work
, but it's impossible:
let w1 = Work(priority: .low) { print("Hello") }
let w2 = Work(priority: .low) { print("World") }
extension Work: Comparable {
static func < (lhs: Work, rhs: Work) -> Bool {
lhs.priority < rhs.priority
}
static func == (lhs: Work, rhs: Work) -> Bool {
return false // that means w1 < w1, contradiction
return true // nonsense, obviously w1 is not equal to w2
return lhs.priority == rhs.priority // same as above
}
}
So, there is no correct implementation of Comparable
for Work
. That can be solved with another protocol for Heap
, something like Prioritizable
:
protocol Prioritizable {
assotiatedtype Priority: Comparable
var priority: Priority { get }
}
And heap will require Heap<T: Prioritizable>
. Now I wonder, is swift-collections a good place to have Prioritizable
protocol. Another option is to have priority getter as field:
struct Heap<T, U:Comparable> {
var priority: (T) -> U
...
}
But that can have performance implications, as compiler can't inline that getter.
Secondly, given that, heap must be "stable", in sense that order of elements with equal priorities must be preserved (FIFO):
let w1 = Work(priority: .low) { print("Hello") }
let w2 = Work(priority: .low) { print("World") }
var heap = Heap<Work>()
heap.push(w1)
heap.push(w2)
heap.pop().work() // outputs "Hello"
heap.pop().work() // outputs "World"
And that's an implementation challenge as well.
ps: for any who is interested, my implementation attempt is here
Firstly, Comparable implies strict order, and that's probably not expected in priority queues.
@zienag I think it's better to just give the Heap
struct an extra field like order
and it is not possible to store both high
and low
in one simple priority queue.
@lorentey what do you think? Does performance implication from stored Priority
getter can be considered negligible?
Another option is to have priority getter as field:
struct Heap<T, U:Comparable> { var priority: (T) -> U ... }
But that can have performance implications, as compiler can't inline that getter.
This approach would have another problem that there couldn't be a conditional Heap<T, U>: Equatable where T: Equatable
conformance because the priority
closure can't be compared for equality. But that issue could be avoided by using key paths instead:
struct Heap<Element, Priority: Comparable> {
var priority: KeyPath<Element, Priority>
...
}
I don't know if this use of key paths would optimise better than closures though.
Equatable
conformance presents another challenge as well. Depending on enqueue
(push
) order, underlying storage may not be equal, when actually those two heaps must be the same.
let h1 = Heap([0, 1, 2, 3])
let h2 = Heap([3, 2, 1, 0])
print(h1._underlyingStorage == h2._underlyingStorage) // might be (and probably would be) false
assert(h1 == h2) // should be true
I've implemented it with copying and comparing poping elements one-by-one:
public static func ==(lhs: Self, rhs: Self) -> Bool {
var (lcopy, rcopy) = (lhs, rhs)
while true {
let (left, right) = (lcopy.pop(), rcopy.pop())
guard left == right else { return false }
guard left != nil && right != nil else { break }
}
return true
}
This is rather inefficient (memory O(n + m)), but correct. I couldn't find any more efficient methods via quick googling.
In this implementation there is no difference between keypath or order function in Equatable
conformance.
In this implementation there is no difference between keypath or order function in Equatable conformance.
More importantly, I think you just proved that Equatable
conformance is hardly even useful. 🤦♂️ (If it were, one might argue whether Heap([0], by: <)
and Heap([0], by: >)
should compare equal, but I digress.)
Another option is to separate priority from Element:
struct Heap<Element, Priority: Comparable> {
func push(_ element: Element, with priority: Priority) {
In that case additional protocol is not needed, Equatable
conformance becomes easier to implement, and compiler has big opportunities for inlining and specialisation. The drawback is that API is harder to use (creating Heap from collection for example).
I think I'd prefer to start with an implementation that simply relied on the Comparable
conformance of its element type. From what I've seen, heaps are very often populated with dedicated types where we can implement ==
/<
in whatever way we like. In the worst case, we need to create a thin wrapper type just to customize the ordering relation -- but note that this is never difficult, it's just slightly annoying. (And the language could potentially help with reducing the boilerplate...)
Closure based approaches suffer from the fact that closures cannot be compared, which makes operations that work on two heap instances (such as merge) impossible to safely implement. They are also generally less efficient than relying on the type system -- with Comparable
, we can make the most use of language/optimizer features such as inlining, specializations, etc etc.
@timvermeulen reminded me today that min-max heaps are a thing, and they get rid of the inconvenience of having to create a wrapper type just to flip the sort order.
With all that said, I like @zienag's idea to model the priority as a separate value! It would be interesting to implement that too, to see how it compares in practice. (One potential drawback I can see has to do with memory use -- the priorities would need to be stored in their own storage buffer.)
Talking about min-max heaps, I would like to point out this article too as some food for thought - it may be that we can have both the functionality and the performance at the same time.
https://probablydance.com/2020/08/31/on-modern-hardware-the-min-max-heap-beats-a-binary-heap/
@hassila Thanks for sharing, that's a great article! If we're able to observe the same performance benefits, I agree we should skip straight to the min-max heap implementation the article describes!
Given that min-max heaps let us avoid having to encode the sort order into the type, does it make sense to go with @pyrtsa's suggestion of using key paths to get the priority?
public struct MinMaxHeap<Element, Priority: Comparable> {
private var storage: [Element]
public let keyPath: KeyPath<Element, Priority>
public init(keyPath: KeyPath<Element, Priority>) {
…
}
I've found this helpful in cases where the type in the queue has a priority, e.g.:
public struct Request {
public let uuid: UUID
public let request: HTTPRequest
public let priority: Float
…
}
public class Client {
private var requestQueue = PriorityQueue(keyPath: \Request.priority)
…
Convenience initializers could be added for types that implement Comparable
, e.g.:
extension MinMaxHeap where Element: Comparable, Priority == Element {
public init() {
self.init(keyPath: \Element.self)
}
}
Given that min-max heaps let us avoid having to encode the sort order into the type, does it make sense to go with @pyrtsa's suggestion of using key paths to get the priority?
I don't think so -- at least not in the first implementation. Like SortedSet
, I think our primary heap implementation ought to be based on Comparable
.
Once we have that, we can then use it to experiment with creating a keypath-based heap variant as a wrapper around the core heap.
I'm using Heap
in one of my projects (I know it's pre-release!), and I'm also facing the issue described above. When would it be appropriate to experiment with creating a keypath-based heap variant? Are contributions welcome at this time?
I decided to publish a Swift package with PriorityHeap
, which builds on Heap
via a Prioritizable
protocol as described earlier in this issue. Let me know if there's interest in adding this to swift-collections
!
By the way, what would it take for Heap
to be released as stable? It seems like it's been more than a couple of years since it was intended for release. Is there a blocker?
I'm not sure if this is the right place for this feedback, but I've noticed a couple of minor (nitty) modifications we might make to the Heap
API:
-
max
andmin
could be computed properties to reflect their O(1) nature -
removeMin
andremoveMax
should be marked@discardableResult
likeArray.removeFirst()
- add method
reserveCapacity
I'd open a PR if these changes are welcome :)
I also implemented a LimitedCapacityPriorityHeap
, which I think would be pretty useful in swift-collections
.
It does make me wonder though, would we also want a LimitedCapacityHeap
? It seems like a lot to have a matrix of options. I honestly think the PriorityHeap
API is more useful for most use cases than the Heap
API, but I guess it does seem simpler to just require Element: Comparable
, and there definitely are some algorithms that don't require additional metadata on each element.
This package is severely resource constrained. Experimenting in a separate code base is the right move for now, at least until we manage to unblock the release pipeline.
- Replacing
min()
/max()
with properties is in (glacial) progress: https://github.com/apple/swift-collections/pull/328 - I agree on
removeMin()
/removeMax()
-- I'll fire off a PR. - Ditto on
reserveCapacity
-- I intentionally delayed adding that, but there is no point in delaying it further.
Fixed-capacity variants of all flat data structures is definitely desirable. However, I'd like to tie their introduction with noncopyable containers, which is the context such variants are most relevant.
(For now, fixed-capacity heaps (arrays, deques, ordered sets, etc) can be emulated by embedding the copy-on-write containers in a simple wrapper that traps in advance if any operation will need to resize the container beyond its reserved capacity.)
Heap
has shipped in release 1.1.0! I'm closing this issue; we can continue investigating higher-level wrapper types / variants on followup issues.