shrinking-challenge binheap challenge

The Wrong Binary Heap challenge should minimally specify:

what is a heap (its structure)
what its elements are supposed to be (general integers, or from some particular range only)
what exactly is the error in the "wrong implementation of a function that converts the binary heap to a sorted list"

Also, ideally, it should also specify how the heap is to be constructed:

randomly by some "generate and test" method or
by using the API (e.g. a sequence of insert calls) of the bin heap data structure.

Aug 26 '20 15:08 kostis

@kostis I won't have time to do it till my return from vacation (mid September). If you find the time, all the better...

Aug 27 '20 13:08 jlink

I'm just implementing this for CsCheck and I'm a little confused as to what smallest means in recursive examples.

The description says (0, None, (0, (0, None, None), (1, None, None))) is the smallest but I find (1, None, (0, None, None)) for 0+ and (0, None, (-1, None, None)) for all ints. I've explicitly added a 'heap count' measure to the size in my generator.

Mar 22 '21 22:03 AnthonyLloyd

@AnthonyLloyd I agree that your examples are smaller for any definition of size that comes to my mind. Maybe there’s a subtle difference in your heap implementation to the original one in Haskell that accounts for the different behaviour. Maybe my translation to Java was wrong.

Probably the challenge requires a better definition/description to make sure we’re all considering the same problem. I think that was @kostis‘ point when opening this issue.

Mar 23 '21 05:03 jlink

I ported your Java. I think it's correct. Not really confident porting the Haskell.

class Heap { public int Head; public Heap Left; public Heap Right; }

[Fact]
public void No7_BinHeap()
{
    static uint Count(Heap h) => h is null ? 0 : 1 + Count(h.Left) + Count(h.Right);

    static Heap MergeHeaps(Heap h1, Heap h2) =>
        h1 is null ? h2
      : h2 is null ? h1
      : h1.Head <= h2.Head ? new Heap { Head = h1.Head, Left = MergeHeaps(h1.Right, h2), Right = h1.Left }
      : new Heap { Head = h2.Head, Left = MergeHeaps(h2.Right, h1), Right = h2.Left };

    static List<int> ToList(Heap heap)
    {
        var r = new List<int>();
        var s = new Stack<Heap>();
        s.Push(heap);
        while (s.Count != 0)
        {
            var h = s.Pop();
            if (h == null) continue;
            r.Add(h.Head);
            s.Push(h.Left);
            s.Push(h.Right);
        }
        return r;
    }

    static List<int> WrongToSortedList(Heap heap)
    {
        var r = new List<int>();
        if (heap is not null)
        {
            r.Add(heap.Head);
            r.AddRange(ToList(MergeHeaps(heap.Left, heap.Right)));
        }
        return r;
    }

    static List<int> Sorted(List<int> l)
    {
        l = new(l);
        l.Sort();
        return l;
    }

    static string Print(Heap h) => h is null ? "None" : $"({h.Head}, {Print(h.Left)}, {Print(h.Right)})";

    Gen.Recursive(g =>
        Gen.Frequency(
            (3, Gen.Const((Heap)null)),
            (1, Gen.Select(Gen.Int, g, g, (h, l, r) => new Heap { Head = h, Left = l, Right = r }))
        ),
        (Heap h, ref Size size) => { size = new Size(Count(h), size); return h; }
    )
    .Sample(h =>
    {
        var l1 = ToList(h);
        var l2 = WrongToSortedList(h);
        return Check.Equal(l2, Sorted(l2)) && Check.Equal(Sorted(l1), l2);
    }, print: Print);
}

Mar 23 '21 07:03 AnthonyLloyd

The description says (0, None, (0, (0, None, None), (1, None, None))) is the smallest but I find (1, None, (0, None, None)) for 0+ and (0, None, (-1, None, None)) for all ints. I've explicitly added a 'heap count' measure to the size in my generator.

@DRMacIver Can you say if the samples found by @AnthonyLloyd are correctly failing or if we are missing something?

Mar 23 '21 10:03 jlink

@AnthonyLloyd I reran the challenge mimicking your generator but do not find a smaller failing sample than the one in the description: https://github.com/jlink/shrinking-challenge/blob/main/pbt-libraries/jqwik/reports/binheap.md

Mar 23 '21 11:03 jlink

@jlink Do you agree that my example should fail to be sorted by wrongToSortedList since the initial head would be added to the list first?

Mar 23 '21 11:03 AnthonyLloyd

@AnthonyLloyd The following example test:

@Example
void simplestFailingSample() {
	// (1, None, (0, None, None))
	Heap h = new Heap(1, null, new Heap(0, null, null));

	List<Integer> l1 = toList(h);
	List<Integer> l2 = wrongToSortedList(h);

	assertThat(l2).isEqualTo(sorted(l2));
}

is failing. So indeed, I agree CsCheck found a smaller failing sample!

Mar 23 '21 13:03 jlink

Guys, interesting discussion, but I would like to insist on my request(s) at the top of this thread, which I guess is a more general comment related to many of the challenges currently used in this repository. Can we please have better specifications of the challenges? (and what is allowed and not allowed for different PBT tools to do in their solutions?)

For example, according to e.g. Wikipedia, even if one disregards all myriads of heap variants that exist, even the basic variant does not come in just one but in two flavors: a max heap and a min heap. Which of the two do we want in this challenge?

If I read the above correctly, the simplest failing example of @AnthonyLloyd, (1, None, (0, None, None)), is a max_heap, while the one in the description of the challenge, (0, None, (0, (0, None, None), (1, None, None))), is a min_heap.

Actually, if I understand the definition of a heap correctly, the latter is not even a heap because it's not almost complete !

To make things even worse, none of these two "minimal" counter-examples are trees with fringe nodes packed to the left (as heaps typically are).

Can we please first all agree on what this challenge is about, starting from the definition of the heap we want to have, the range of its elements, and how the heap should be constructed?

Mar 24 '21 12:03 kostis

@kostis I am with you that starting from a written specification would be better. I do think, however, that the current situation is better than nothing because we have a spec in the form of the original code. This spec may not agree with any formal definition of binary heap but it can serve as a benchmark anyway.

So if you want to do the first step of writing a spec I’m more than willing to review it and give feedback.

Mar 24 '21 14:03 jlink

OK, I've spent some time yesterday and tried to understand the Haskell program that we should use as "the spec". Also, I've written a PropEr module that implements "the current challenge".

I must say I am quite disappointed, because:

the problem, as stated, IMO is not challenging at all [2], and
there are various issues and design decisions that are dubious in "the spec" and in the claim made here that

Interestingly Hypothesis (and I think SmartCheck and QuickCheck too) seems to never find the smallest example here, which is the four valued heap (0, None, (0, (0, None, None), (1, None, None)))

As we know by now, this is not the smallest example and, even worse IMO, it's not even a (proper) binary heap data structure -- at least the way I understand what a binary heap is / should be. Perhaps @DRMacIver may want to comment or shed some light on how this challenge should be interpreted.

Anyway, if we are to keep such a challenge, it needs to become more involved (e.g., the erroneous sort function cannot be so broken, but instead only be erroneous when the size of the binary heap is above some appropriate threshold, say 10). Also, we need to provide explicit instructions on how the binary heap is supposed to be constructed. Note that one cannot expect to generate proper binary heaps (of non-trivial size) with just a random generator. [1]

Thoughts on the above?

[1] In PropEr, I ended up generating proper binary heaps with the following generator:

binheap() -> 
  ?LET(L, list(integer()), heapify(L)).

i.e., create a random list of integers and then call heapify on them to turn them into a proper binary heap using an algorithm similar to what is described e.g. here or here.

For efficient balancing, in my implementation, I used an extra field (a counter: the number of items below each node), so the nodes in the binary heap that heapify constructs have the form {Item, Count, LeftHeap, RightHeap}.

[2] For what is worth, I include below the first three runs that I get when I run my version of "the challenge" with PropEr. As you can see, PropEr finds a counterexample within the first 10 runs and shrinks it in only one or two attempts.

Eshell V11.1.8  (abort with ^G)
1> c(binheap).
{ok,binheap}
2> proper:quickcheck(binheap:prop_sorted()).
....!
Failed: After 5 test(s).
{-1,2,{-6,1,none,none},none}

Shrinking .(1 time(s))
{0,2,{-1,1,none,none},none}
false
3> proper:quickcheck(binheap:prop_sorted()).
.......!
Failed: After 8 test(s).
{5,2,{3,1,none,none},none}

Shrinking ..(2 time(s))
{1,2,{0,1,none,none},none}
false
4> proper:quickcheck(binheap:prop_sorted()).
......!
Failed: After 7 test(s).
{15,2,{0,1,none,none},none}

Shrinking .(1 time(s))
{1,2,{0,1,none,none},none}
false

Mar 25 '21 07:03 kostis

@kostis Many thanks for diving into the challenge! I don't think every challenge has to be disected in this way but having a few well-understood examples is important for all of us who are convinced of the usefulness of PBT and want to promote it in talks, presentations, articles etc.

Interestingly Hypothesis (and I think SmartCheck and QuickCheck too) seems to never find the smallest example here, which is the four valued heap (0, None, (0, (0, None, None), (1, None, None)))

As we know by now, this is not the smallest example and, even worse IMO, it's not even a (proper) binary heap data structure -- at least the way I understand what a binary heap is / should be.

I assume you refer to missing completeness as stated in https://en.wikipedia.org/wiki/Binary_heap. If we demand completeness the underlying implementation will be considerably more complicated, which will make comparability of solutions in different languages more difficult. I'm not sure it helps with the shrinking aspect of the challenge. But it may well do so, since shrinking will also have to produce only valid binary heaps.

For efficient balancing, in my implementation, I used an extra field

I would forgo efficency unless it has implications for shrinking.

Mar 26 '21 07:03 jlink

I don't think it will be possible to make the challenges realistic (e.g. in the real world larger sized cases tend to be more likely to fail for example. This is not always true for the challenges). Your best bet are interesting simple problem that stretch the shrinkers. From working on other benchmarks like the benchmarks game there is a real risk of over specifying and never being able to keep the challenge fair. In that case in terms of language features and in this PBT/shrinker features.

Basically I think more clarity but more complexity will come back on you.

Mar 26 '21 08:03 AnthonyLloyd