scalacheck icon indicating copy to clipboard operation
scalacheck copied to clipboard

trees in scalacheck

Open mosesn opened this issue 8 years ago • 13 comments

Trees are a pain to generate with scalacheck because scalacheck makes them too big. This means that folks who want to experiment with large trees have to build an ad-hoc depth-limiting function into their tree generators. Scalacheck should provide tooling so that we don't have to constantly reimplement this.

A possible solution (and perhaps to illustrate my complaint better):

sealed trait Tree
case object Leaf extends Tree
case class Branch(left: Tree, right: Tree) extends Tree

def tree2[T](maxDepth: Int, g: Gen[(T, T) => T], leaves: Gen[T]): Gen[T] = {
  def go(depth: Int): Gen[T] = {
    if (maxDepth) == 0 leaves
    else {
      Gen.oneOf(
        for {
          fn <- g
          next = go(maxDepth - 1)
          left <- next
          right <- next
        } yield fn(left, right),
        leaves
      )
    }
  }
  recurse(maxDepth)
}

tree2(50, Gen.const((left, right) => Branch(left, right)), Gen.const(Leaf))

mosesn avatar Dec 01 '16 18:12 mosesn

FYI, scalacheck-shapeless allows to (almost) explicitly limit the size of the generated trees, since its version 1.1.4, like

import org.scalacheck._, Shapeless._

sealed trait Tree
case object Leaf extends Tree
case class Branch(left: Tree, right: Tree) extends Tree

implicit val treeRecursive = derive.Recursive[Tree](Gen.const(Leaf)) // argument is the generator used (at the leaves) if the generated tree gets too big

The size from the Gen.Parameters used during the generation then makes the tree generation stop after some depth.

~~Currently, a small glitch in the case class generation makes this not put a hard limit on the depth of the generated trees, but it's about to be fixed.~~ (https://github.com/alexarchambault/scalacheck-shapeless/pull/52 fixes that)

alexarchambault avatar Dec 01 '16 22:12 alexarchambault

Very cool! Would love to be able to do it without adding a dependency on shapeless though.

mosesn avatar Dec 01 '16 22:12 mosesn

@mosesn Just do like scalacheck-shapeless does: use Gen.sized / Gen.resize, making sure the Tree arbitrary makes the size decrease.

alexarchambault avatar Dec 03 '16 01:12 alexarchambault

@alexarchambault how do I make the size decrease?

mosesn avatar Dec 03 '16 03:12 mosesn

in scalaz (Rose Tree)

  • https://github.com/scalaz/scalaz/blob/v7.3.0-M7/scalacheck-binding/src/main/scala/scalaz/scalacheck/ScalazArbitrary.scala#L311-L336
  • https://github.com/scalaz/scalaz/blob/v7.3.0-M7/tests/jvm/src/test/scala/scalaz/TreeTestJVM.scala#L13-L16

xuwei-k avatar Dec 03 '16 03:12 xuwei-k

@alexarchambault I don't think using Gen.resize like you do in scalacheck-shapeless doesn't seem like it would make ad-hoc trees simpler.

Clearly this is a common use case, since there are at least two in finagle, there's one in scalaz, and one in scalacheck-shapeless. Maybe it's time for one of the generic implementations to graduate to scalacheck? What do you think, @rickynils @non?

mosesn avatar Dec 03 '16 16:12 mosesn

actually, @tixxit even took a stab at generating DAGs in a generic way.

oscar-stripe avatar Dec 05 '16 19:12 oscar-stripe

@oscar-stripe is that open source? might it get into scalacheck?

mosesn avatar Dec 05 '16 21:12 mosesn

@rickynils have you had a chance to take a look at this? what do you think?

mosesn avatar Dec 20 '16 21:12 mosesn

@mosesn It is not open source (currently just a prototype in a gist somewhere). I'll take a stab at cleaning it up and "releasing" it this weekend though!

tixxit avatar Dec 22 '16 19:12 tixxit

@tixxit oh, cool! have you had a chance to open source it yet?

mosesn avatar Dec 29 '16 15:12 mosesn

On the topic of generating test data for recursive data structures, this answer clarified how to use Gen.sized and Gen.resized:

http://stackoverflow.com/a/42855840/409976.

kevinmeredith avatar Mar 19 '17 13:03 kevinmeredith

I ran into this and couldn't get scalacheck-shapeless to generate instances (likely due to the fact that the recursion was indirect, the ASTs were wide, and there was some mutual recursion).

Thinking about the problem, I realized that a big part of the issue is that the applicative combinators (mostly thinking of resultOf) pass through the same size to the generators for the subtrees. This tends not to matter too much if your types aren't too big, but it becomes a showstopper in the presence of recursion. Borrowing from the ideas of the SO question above, I came up with these additional combinators:

/** Split the current generator size into the specified number of sub-groups.
   *
   * The sum of the sizes should equal 1 less than the initial generator size. The length of the
   * list returned is equal to the requested number of groups.
   *
   * @param n how many sub-groups to split into?
   * @return size of sub-groups
   */
private[this] def partitionReducedSize(n: Int): Gen[Seq[Int]] =
   for {
     size <- Gen.size
     decrementedSize = size - 1
     if decrementedSize >= 0
     groupSize = decrementedSize / n
     remainder = decrementedSize % n
     groups = List.tabulate(n)(i => if (i < remainder) 1 + groupSize else groupSize)
     shuffledGroups <- Gen.pick(n, groups)
   } yield shuffledGroups.toList

def reducedResultOf[T1: Arbitrary, R](f: T1 => R): Gen[R] =
  for {
    Seq(s1) <- partitionSize(1)
    t1 <- Gen.resize(s1, arbitrary[T1])
  } yield f(t1)

def reducedResultOf[T1: Arbitrary, T2: Arbitrary, R](f: (T1, T2) => R): Gen[R] =
  for {
    Seq(s1, s2) <- partitionSize(2)
    t1 <- Gen.resize(s1, arbitrary[T1])
    t2 <- Gen.resize(s2, arbitrary[T2])
  } yield f(t1, t2)

def reducedResultOf[T1: Arbitrary, T2: Arbitrary, T3: Arbitrary, R](f: (T1, T2, T3) => R): Gen[R] =
  for {
    Seq(s1, s2, s3) <- partitionSize(3)
    t1 <- Gen.resize(s1, arbitrary[T1])
    t2 <- Gen.resize(s2, arbitrary[T2])
    t3 <- Gen.resize(s3, arbitrary[T3])
  } yield f(t1, t2, t3)

// and so on

With this, I was able to just use Gen.lzy + Gen.oneOf for sealed class/sealed trait and reducedResultOf for case class. The "size" of the generator bounds the total number of nodes in the generated AST. The only other gotcha I worked around is to add a biased variant of Gen.oneOf that short-circuits to a base case Gen if the size is too small (since otherwise the tree generator risks failing due to too many skipped test cases).

I do wish that utilities like these came with ScalaCheck (OTOH, I see there isn't much for this in Haskell's QuickCheck either...). If there is interest I can open a PR to upstream these.

harpocrates avatar Nov 01 '21 16:11 harpocrates