swift-collections icon indicating copy to clipboard operation
swift-collections copied to clipboard

Priority queue

Open lorentey opened this issue 3 years ago • 17 comments

Priority queues are commonly requested data structures, and are universally useful. A high-quality implementation would be a natural candidate for inclusion in this package. (Priority queues are used in sorting algorithms, various graph algorithms, networking protocol implementations, etc. etc. etc.)

I expect a straightforward array-backed binary heap might make most sense for the initial implementation, but I wouldn't be surprised if a fancier variant proved better overall.

struct Heap<Element: Comparable>

This would likely be mostly an implementation exercise rather than an API design challenge (although API design is never trivial). We'd probably need to experiment with several different data structure implementations, and perhaps even come up with our own variants that's better suited to Swift's unique combination of language features.

lorentey avatar Apr 05 '21 19:04 lorentey

This would be super useful for my current use case! Sorry if this is a beginner question here, but I was wondering, what advantage would this offer over using an OrderedSet and sorting it by the priority in a model (as in, I have a model with a priority property and I sort the OrderedSet by that property)? (Also, would the pqueue auto sort upon insertion?)

julianschiavo avatar Apr 06 '21 05:04 julianschiavo

@julianschiavo It takes less computing while you read the biggest few elements. It is more effective when you only use some of the biggest elements but not all collection. It's a queue but not a random access collection.

what advantage would this offer over using an OrderedSet and sorting it by the priority in a model

A priority queue is not a set, it may allows duplicated elements (depends on implementation).

Also, would the pqueue auto sort upon insertion?

A heap is not stored as sorted sequence, it only guarantees the top elements are bigger or smaller than bottom elements.

Heap is a complete binary tree and could be stored in a continuous array. Index n's parent is n/2 and it's children are n*2 + 1 and n*2 + 2. The heap:

        10
   9           3
8     7     _     _

It can be stored as array:

10 9 3 8 7 _ _

It is partial sorted bu not full sorted upon insertion or deletion. So it take less time to insert or delete an element.

  • Insertion: Add a new element at the tail of the array, and pop the element to a proper position.

  • Deletion: You can only remove the top element. Every removed element should replaced by the biggest direct child.

Both operation takes no more than logN times swap operation.

634750802 avatar Apr 06 '21 06:04 634750802

Got it, thank you so much for your in-depth explanation, really helps with understanding the concept! That's exactly what I'm looking for if I have that right 😄

So to make sure I understand, the idea behind "it only guarantees the top elements are bigger [...]" is that as long as I dequeue from the top (in a PQ of highest priority at the top), each element I dequeue will have the top priority in all the elements? (I understand from your explanation this continues to be the case if I inserted new elements—I guess that's kind of what I meant by sorting on insertion, though I know now it's not technically doing that)

julianschiavo avatar Apr 06 '21 06:04 julianschiavo

Yes.

See the heap in my reply. If you take 10, 9 will popped to the top, and 8 will be popped up. For 8 has no child and it's not the last element, 7 will be put into the slot. So the resorted heap is:

         9
   8           3
7     _     _     _

If you dequeue all elements one by one, they would be ordered by priority.

634750802 avatar Apr 06 '21 06:04 634750802

👍 Thank you!

julianschiavo avatar Apr 06 '21 06:04 julianschiavo

I've tried to implement this, and it got me thinking. Firstly, Comparable implies strict order, and that's probably not expected in priority queues. Consider usage:

enum Priority {
  case high
  case low
}
struct Work {
  var priority: Priority
  var work: () -> ()
}

And if some would want to use Work in priority queue, it will require them to implement Comparable for Work, but it's impossible:

let w1 = Work(priority: .low) { print("Hello") }
let w2 = Work(priority: .low) { print("World") }

extension Work: Comparable {
  static func < (lhs: Work, rhs: Work) -> Bool {
    lhs.priority < rhs.priority
  }

  static func == (lhs: Work, rhs: Work) -> Bool {
    return false // that means w1 < w1, contradiction
    return true // nonsense, obviously w1 is not equal to w2
    return lhs.priority ==  rhs.priority // same as above
  }
}

So, there is no correct implementation of Comparable for Work. That can be solved with another protocol for Heap, something like Prioritizable:

protocol Prioritizable {
  assotiatedtype Priority: Comparable
  var priority: Priority { get }
}

And heap will require Heap<T: Prioritizable>. Now I wonder, is swift-collections a good place to have Prioritizable protocol. Another option is to have priority getter as field:

struct Heap<T, U:Comparable> {
  var priority: (T) -> U
  ...
}

But that can have performance implications, as compiler can't inline that getter.

Secondly, given that, heap must be "stable", in sense that order of elements with equal priorities must be preserved (FIFO):

let w1 = Work(priority: .low) { print("Hello") }
let w2 = Work(priority: .low) { print("World") }

var heap = Heap<Work>()
heap.push(w1)
heap.push(w2)

heap.pop().work() // outputs "Hello"
heap.pop().work() // outputs "World"

And that's an implementation challenge as well.

ps: for any who is interested, my implementation attempt is here

zienag avatar Apr 08 '21 01:04 zienag

Firstly, Comparable implies strict order, and that's probably not expected in priority queues.

@zienag I think it's better to just give the Heap struct an extra field like order and it is not possible to store both high and low in one simple priority queue.

634750802 avatar Apr 09 '21 02:04 634750802

@lorentey what do you think? Does performance implication from stored Priority getter can be considered negligible?

zienag avatar Apr 09 '21 09:04 zienag

Another option is to have priority getter as field:

struct Heap<T, U:Comparable> {
  var priority: (T) -> U
  ...
}

But that can have performance implications, as compiler can't inline that getter.

This approach would have another problem that there couldn't be a conditional Heap<T, U>: Equatable where T: Equatable conformance because the priority closure can't be compared for equality. But that issue could be avoided by using key paths instead:

struct Heap<Element, Priority: Comparable> {
    var priority: KeyPath<Element, Priority>
    ...
}

I don't know if this use of key paths would optimise better than closures though.

pyrtsa avatar Apr 09 '21 12:04 pyrtsa

Equatable conformance presents another challenge as well. Depending on enqueue (push) order, underlying storage may not be equal, when actually those two heaps must be the same.

let h1 = Heap([0, 1, 2, 3])
let h2 = Heap([3, 2, 1, 0])
print(h1._underlyingStorage == h2._underlyingStorage) // might be (and probably would be) false
assert(h1 == h2) // should be true

I've implemented it with copying and comparing poping elements one-by-one:

public static func ==(lhs: Self, rhs: Self) -> Bool {
  var (lcopy, rcopy) = (lhs, rhs)
  while true {
    let (left, right) = (lcopy.pop(), rcopy.pop())
    guard left == right else { return false }
    guard left != nil && right != nil else { break }
  }
  return true
}

This is rather inefficient (memory O(n + m)), but correct. I couldn't find any more efficient methods via quick googling. In this implementation there is no difference between keypath or order function in Equatable conformance.

zienag avatar Apr 09 '21 15:04 zienag

In this implementation there is no difference between keypath or order function in Equatable conformance.

More importantly, I think you just proved that Equatable conformance is hardly even useful. 🤦‍♂️ (If it were, one might argue whether Heap([0], by: <) and Heap([0], by: >) should compare equal, but I digress.)

pyrtsa avatar Apr 09 '21 19:04 pyrtsa

Another option is to separate priority from Element:

struct Heap<Element, Priority: Comparable> {
  func push(_ element: Element, with priority: Priority) {

In that case additional protocol is not needed, Equatable conformance becomes easier to implement, and compiler has big opportunities for inlining and specialisation. The drawback is that API is harder to use (creating Heap from collection for example).

zienag avatar Apr 12 '21 10:04 zienag

I think I'd prefer to start with an implementation that simply relied on the Comparable conformance of its element type. From what I've seen, heaps are very often populated with dedicated types where we can implement ==/< in whatever way we like. In the worst case, we need to create a thin wrapper type just to customize the ordering relation -- but note that this is never difficult, it's just slightly annoying. (And the language could potentially help with reducing the boilerplate...)

Closure based approaches suffer from the fact that closures cannot be compared, which makes operations that work on two heap instances (such as merge) impossible to safely implement. They are also generally less efficient than relying on the type system -- with Comparable, we can make the most use of language/optimizer features such as inlining, specializations, etc etc.

@timvermeulen reminded me today that min-max heaps are a thing, and they get rid of the inconvenience of having to create a wrapper type just to flip the sort order.

With all that said, I like @zienag's idea to model the priority as a separate value! It would be interesting to implement that too, to see how it compares in practice. (One potential drawback I can see has to do with memory use -- the priorities would need to be stored in their own storage buffer.)

lorentey avatar Apr 14 '21 02:04 lorentey

Talking about min-max heaps, I would like to point out this article too as some food for thought - it may be that we can have both the functionality and the performance at the same time.

https://probablydance.com/2020/08/31/on-modern-hardware-the-min-max-heap-beats-a-binary-heap/

hassila avatar Apr 15 '21 12:04 hassila

@hassila Thanks for sharing, that's a great article! If we're able to observe the same performance benefits, I agree we should skip straight to the min-max heap implementation the article describes!

kylemacomber avatar Apr 15 '21 13:04 kylemacomber

Given that min-max heaps let us avoid having to encode the sort order into the type, does it make sense to go with @pyrtsa's suggestion of using key paths to get the priority?

public struct MinMaxHeap<Element, Priority: Comparable> {
    private var storage: [Element]
    public let keyPath: KeyPath<Element, Priority>

    public init(keyPath: KeyPath<Element, Priority>) {
        …
    }

I've found this helpful in cases where the type in the queue has a priority, e.g.:

public struct Request {
    public let uuid: UUID
    public let request: HTTPRequest
    public let priority: Float
    …
}

public class Client {
    private var requestQueue = PriorityQueue(keyPath: \Request.priority)
    …

Convenience initializers could be added for types that implement Comparable, e.g.:

extension MinMaxHeap where Element: Comparable, Priority == Element {
    public init() {
        self.init(keyPath: \Element.self)
    }
}

AquaGeek avatar May 11 '21 00:05 AquaGeek

Given that min-max heaps let us avoid having to encode the sort order into the type, does it make sense to go with @pyrtsa's suggestion of using key paths to get the priority?

I don't think so -- at least not in the first implementation. Like SortedSet, I think our primary heap implementation ought to be based on Comparable.

Once we have that, we can then use it to experiment with creating a keypath-based heap variant as a wrapper around the core heap.

lorentey avatar May 28 '21 03:05 lorentey

I'm using Heap in one of my projects (I know it's pre-release!), and I'm also facing the issue described above. When would it be appropriate to experiment with creating a keypath-based heap variant? Are contributions welcome at this time?

JadenGeller avatar Nov 05 '23 22:11 JadenGeller

I decided to publish a Swift package with PriorityHeap, which builds on Heap via a Prioritizable protocol as described earlier in this issue. Let me know if there's interest in adding this to swift-collections!

JadenGeller avatar Nov 06 '23 03:11 JadenGeller

By the way, what would it take for Heap to be released as stable? It seems like it's been more than a couple of years since it was intended for release. Is there a blocker?

JadenGeller avatar Nov 06 '23 03:11 JadenGeller

I'm not sure if this is the right place for this feedback, but I've noticed a couple of minor (nitty) modifications we might make to the Heap API:

  • max and min could be computed properties to reflect their O(1) nature
  • removeMin and removeMax should be marked @discardableResult like Array.removeFirst()
  • add method reserveCapacity

I'd open a PR if these changes are welcome :)

JadenGeller avatar Nov 07 '23 01:11 JadenGeller

I also implemented a LimitedCapacityPriorityHeap, which I think would be pretty useful in swift-collections.

It does make me wonder though, would we also want a LimitedCapacityHeap? It seems like a lot to have a matrix of options. I honestly think the PriorityHeap API is more useful for most use cases than the Heap API, but I guess it does seem simpler to just require Element: Comparable, and there definitely are some algorithms that don't require additional metadata on each element.

JadenGeller avatar Nov 07 '23 05:11 JadenGeller

This package is severely resource constrained. Experimenting in a separate code base is the right move for now, at least until we manage to unblock the release pipeline.

  • Replacing min()/max() with properties is in (glacial) progress: https://github.com/apple/swift-collections/pull/328
  • I agree on removeMin()/removeMax() -- I'll fire off a PR.
  • Ditto on reserveCapacity -- I intentionally delayed adding that, but there is no point in delaying it further.

Fixed-capacity variants of all flat data structures is definitely desirable. However, I'd like to tie their introduction with noncopyable containers, which is the context such variants are most relevant.

(For now, fixed-capacity heaps (arrays, deques, ordered sets, etc) can be emulated by embedding the copy-on-write containers in a simple wrapper that traps in advance if any operation will need to resize the container beyond its reserved capacity.)

lorentey avatar Jan 30 '24 01:01 lorentey

Heap has shipped in release 1.1.0! I'm closing this issue; we can continue investigating higher-level wrapper types / variants on followup issues.

lorentey avatar Feb 08 '24 01:02 lorentey