cupq icon indicating copy to clipboard operation
cupq copied to clipboard

Heap insertion implementation is incorrect

Open JamiesWhiteShirt opened this issue 5 years ago • 1 comments

First of all, I want to point out this library has been valuable material for my master thesis, so many thanks for making it available under a permissive license. I am implementing a parallel heap with generic key types (+ comparators) and generic value types based on this one for branch-and-bound algorithms. While working on this, I noticed an issue with the implementation of the insertion procedure, so I thought I should contribute a little back.

https://github.com/crosetto/cupq/blob/e4dec3bad4f126d1009837022d6249eea0ad4aa2/src/device/heap.h#L172-L218

It concerns an insertion where the last node in the heap is full (sPosition == WarpSize) so that the heap has to be grown with a new node. If the inserted key is greater than the smallest item in the last node (n.first > sTops[mSize - 1]), the implementation grows the heap by initializing a new node with the inserted item as the smallest (only) item of the node (as seen on lines 189-192).

https://github.com/crosetto/cupq/blob/e4dec3bad4f126d1009837022d6249eea0ad4aa2/src/device/heap.h#L189-L192

In many cases, the heap property is preserved, but not all. The heap property may be violated if the previous last node and the newly initialized node are not siblings, in other words if they do not have the same parent.

Let a be the (zero-indexed) index of the last node prior to the insertion, and assume the heap property is preserved prior to the insertion. We know that sTops[parent(a)] <= sTops[a] (the heap property) is true both prior to the insertion and following the insertion because node a and its parent are both left unmodified by the insertion procedure. Then let b = a + 1 be the index of the newly initialized node following the insertion. Following the insertion, we know that sTops[a] < sTops[b], which transitively implies that sTops[parent(a)] < sTops[b].

Only if parent(a) == parent(b) do we know that the heap property is preserved between node b and its parent. If node a is the last child of its parent, node b gets a different parent than node a. Specifically, node b gets a different parent whenever a % arity == 0.

This creates an issue where node b may be inserted without regard for the heap property between it and its parent. As an example, consider this binary heap containing only keys, with two items per node:

         1,2
   8,9         3,4

If we were to insert 5, it would be compared with 3. Because the last node is full and 5 > 3, the implementation initializes a new node as such, violating the heap property:

         1,2
   8,9         3,4
5,_

I was curious to see whether this issue can actually cause the priority queue work incorrectly, and I found an example to prove it. Consider the following heap state created by inserting 6, 7, 7, 8, 8 into the above heap:

         1,2
   8,9         3,4
5,6   7,7   8,8

If we were to pop five items, we would expect to pop 1, 2, 3, 4, 5 in that order. However, after popping the first four, the heap looks like this:

         7,7
   8,9         8,8
5,6

This would pop 7 instead of 5 off the heap.

To solve the issue, the item in the newly inserted node would have to be propagated up if the nodes are not siblings.

In my implementation (which will be published in a month or so), I have resorted to always propagate the item up whenever a new node has to be inserted, and only comparing the inserted item with the smallest item in the last node if the node has space for another item.

I hope this description of the issue is helpful if you intend to resolve it.

JamiesWhiteShirt avatar May 16 '20 11:05 JamiesWhiteShirt

Thanks for the detailed bug report! I'll look into it as soon as I have some spare time.

crosetto avatar May 18 '20 23:05 crosetto