fast-check Statistics shrinked label examples and minimum percentage of labels

Statistics shrinked label examples and minimum percentage of labels

Open EmilTholin opened this issue 5 years ago • 23 comments

Thank you so much for your hard work on fast-check! I've only recently started to experiment with property based testing, and fast-check is so well-written and well-documented and an absolute joy to work with!

I saw John Hughes' talk Building on developers' intuitions and got very excited about the new shrinked label examples and minimum percentage of labels features in QuickCheck. Do you think it would be possible to add these features to this great library?

Apr 08 '19 19:04 EmilTholin

Thanks a lot for those great references, I will have a deeper look into the approach in order to see how I can bring it into fast-check.

It is good to know, that fast-check already has limited support for labelled inputs. fc.statistics is able to show the user how many generated values are generated for each label:

const fc = require('fast-check');

const kvPairArb = fc.tuple(fc.integer(-10, 10), fc.integer());
const kvPairEqual = (a, b) => a[0] === b[0];

fc.statistics(
    // Property you would have passed into fc.assert
    fc.property(
        fc.set(kvPairArb, kvPairEqual),
        kvPairArb,
        () => true // code under test
    ),
    // Labeling function
    // can either return a single string label or an array of multiple string labels
    ([vs, [k, v]]) => {
        const label = vs.length === 0 ? 'empty'
            : Math.min(...vs.map(([kv,vv]) => kv)) >= k ? 'at start'
            : Math.max(...vs.map(([kv,vv]) => kv)) <= k ? 'at end'
            : 'at middle';
        return label +', ' + (vs.find(([kv,vv]) => kv === k) ? 'update' : 'create');
    },
    { numRuns: 10000 });

Running code on Runkit: https://runkit.com/dubzzz/5cabc890d332880012e2dcfa

With this code I got:

at middle, create..38.83%
at end, create.....15.73%
at start, create...15.19%
at middle, update..12.99%
empty, create.......9.99%
at start, update....3.98%
at end, update......3.29%

I'll come back to you asap ;) Thanks a lot for the references

Apr 08 '19 22:04 dubzzz

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aug 28 '20 21:08 stale[bot]

Hmm... this seems to fail when run inside Deno

statistics test
TypeError: this.arb.withBias is not a function
    at ConverterToNext.generate (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:539:48)
    at Property.generate (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:962:28)
    at https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:1738:26
    at https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:1978:58
    at mapHelper (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:35:11)
    at mapHelper.next (<anonymous>)
    at Object.statistics (https://cdn.skypack.dev/-/[email protected]/dist=es2019,mode=imports/optimized/fast-check.js:1989:14)
    at file:///.../tests/example.test.ts:197:6
    at asyncOpSanitizer (deno:runtime/js/40_testing.js:21:15)
    at resourceSanitizer (deno:runtime/js/40_testing.js:58:13)

// @ts-nocheck
// https://github.com/dubzzz/fast-check/issues/2781

Deno.test("statistics test", () => {
  const kvPairArb = fc.tuple(fc.integer(-10, 10), fc.integer());
  const kvPairEqual = (a, b) => a[0] === b[0];

  fc.statistics(
    // Property you would have passed into fc.assert
    fc.property(
      fc.set(kvPairArb, kvPairEqual),
      kvPairArb,
      () => true // code under test
    ),
    // Labeling function
    // can either return a single string label or an array of multiple string labels
    ([vs, [k, v]]) => {
      const label = vs.length === 0 ? 'empty'
        : Math.min(...vs.map(([kv,vv]) => kv)) >= k ? 'at start'
        : Math.max(...vs.map(([kv,vv]) => kv)) <= k ? 'at end'
        : 'at middle';
      return label +', ' + (vs.find(([kv,vv]) => kv === k) ? 'update' : 'create');
    },
    { numRuns: 1000 });
});

May 12 '22 15:05 moodmosaic

I see a workaround in https://github.com/dubzzz/fast-check/issues/2736, unsure if it applies here though.

May 12 '22 15:05 moodmosaic

Workaround should work fine for this case in theory

May 12 '22 18:05 dubzzz

If you're on fast-check v3, make sure to replace set with uniqueArray.

Jun 15 '22 20:06 moodmosaic

@dubzzz

fc.statistics(
  // Property you would have passed into fc.assert
   fc.property(

This means it can't be combined with fc.assert?

Jun 15 '22 20:06 moodmosaic

It can be used with assert. It's just that statistics will never run the predicate defined by the property

Jun 15 '22 20:06 dubzzz

So what's the best way to include statistics in a property test?

Jun 15 '22 21:06 moodmosaic

statistics, at least today, is mostly there to help users check case by case how well the arbitrary or property will perform. By will perform, I mean: how many cases it will covers, how many x vs y...

Jun 15 '22 21:06 dubzzz

Which means it has to run in addition to the property tests that run normally in a test suite. 🤔

Jun 15 '22 21:06 moodmosaic

So far, there is no support for: my arbitrary should produce 50% of its values above x... Maybe something that will come but not planned yet.

Jun 15 '22 21:06 dubzzz

So as far as I can tell, I'd have to run the property (via fc.assert) and then re-run the property (via fc.statistics). But this won't reflect the actual test distribution (the one used initially in fc.assert) right?

Jun 17 '22 08:06 moodmosaic

Right! fc.statistics is just a debugging too to tell you what would be the sahpe of the values produced by your property or arbitrary if used within fc.assert. It does not run any checks.

Jun 17 '22 09:06 dubzzz

@dubzzz and @moodmosaic I have a fork of fast-check that implements this label and covering feature and I'd like to get a PR open. I'm having some difficulty getting the tests to run locally and want to get some feedback from you @dubzzz on the DSL for doing that labeling and how to treat statistics if this new feature becomes part of the mainline.

Is it OK if I open a PR that's largely a draft so I can get some Github Actions feedback on the tests/doc generation, and your feedback on some of the implementation details that I think you'll have strong feelings about?

What I have working in the fork is:

fc.assert(
  /** cover makes a prop with the cover/label requirements embedded */
  fc.property(fc.integer(), (x) => true).cover(
    /** fail if less than 2% of generated values are larger than 10000 */
    fc.label(2, 'big nums', x => x > 10000),
    fc.label(75, 'smollest of smol nums', x => x < -100000),
  )
);
/**
 * gives you something like this (I'm still working on the RunDetails)
 * even though the prop always returns true, the run fails
 * 
 * Error: Test run failed to cover one or more of the following input classifiers:
 *              big nums: 36% [.......             ] ✓  2%
 * smollest of smol nums: 62% [............        ] ✗ 75%
 */

Oct 10 '22 22:10 xtianjohns

Yes, definitely. Let's open a PR.

Cases we will have to check before merging anything:

how does it behave when in shrink mode?

Side note: I'm not sure that cover will make it that way, but it's definitely something worth seeing. There are more and more needs for features extending how property or assert behave, I have not fully made my mind about it a some kind of plugin system might be a solution for that in the future. So far, for property, we can already decorate implementations with others.

Oct 11 '22 06:10 dubzzz

For tests, here is the recipe to run them locally:

# from root of the repo
yarn
yarn build:all
cd packages/fast-check

# from packages/fast-check 
yarn test
yarn e2e

Oct 11 '22 06:10 dubzzz

@xtianjohns The test issue you encountered might be fixed once #3301 gets merged

Oct 11 '22 20:10 dubzzz

Awesome thanks @dubzzz I'll give that a go and see where I end up.

With regards to cover being a member of property, I am a new contributor and don't totally understand the ergonomics landscape you lay out. What I can say is that I know cover doesn't belong as a member within arbitraries. I enjoy labels being bundled with properties rather than fed into different test runner primitives (like making fc.statistics actually run the test and do the covering) because I'm imagining people not having distinct uses for the entry points. Basically - I can't think of a time where you would have a property and want it fed into both assert and statistics. You'll want one, or the other. Conceptually, the property containing within it a definition of labeling requirements is consistent with prior art (linked at the top of the issue).

I totally understand that there are other considerations though. I'm open to making it work. Admittedly, it does feel jank (what I'm doing) just making a new CoveredProperty type and slapping a method on it for generating statistics. Specific examples, with notes:

/** the thing we should not do */
fc.assert(fc.property(arb.cover(/* ...plz no ... */), somethin));
// 1. doesn't allow you to label based on multiple combined arbitraries
// 2. does not lend itself to reuse, the logic in 'something' is the thing that depends on a particular distribution of values
// 3. no other prop test libraries do this

/** the thing we could do */
fc.statistics( property, classifier );
// 1. keeps assert clean, you just have to clobber existing statistics or make a new thing i.e. 'fc.coverStats'
// 2. keeps property unchanged, which I guess is a win because #reasons
// 3. the classifier works like before where it takes inputs and generates all possible labels
// 4. because of the above, I can't reuse my labels or transfer them easily to other props

/** the thing I did, hope sticks */
const coveredProp = prop.cover(labelA, labelB); // my implementation
const coveredProp = fc.cover(prop, labelA, labelB); // something I considered
const coveredProp = fc.cover(prop, [labelA, labelB]); // arrays are cool
fc.assert( coveredProp );
// 1. making covered props keeps the API as consistent with current assert, no need to use a different fc.checkStuff
// 2. most like other prop test libraries in other runtimes
// 3. labels define their own predicates for whether a combination of values is the label, rather than value -> label | Array[label],
// this means you don't end up with one massive function that defines all labels
// 4. labels compose, so you can reuse them like:
const bigNums = (pct) => label(pct, 'bigNums', _.gt(1000));
propA.cover(bigNums(50) /** 50% need to be big */);
propB.cover(bigNums(75) /** 75% need to be big */);

With regards to shrinking: the concept of labeling requirements (for me, at least) excludes all other failure reasons as a precondition. Essentially, if the property holds, if it doesn't time out or hit max skips, then finally we check this labeling coverage feature. It's just... one more thing that can fail. When we fail for other reasons? Those reasons end up in the report, and I never write a labeling analysis into the RunExecution. So it's a no-op, same behavior as before.

  const runner = new RunnerIterator(sourceValues, shrink, verbose, interruptedAsFailure);
  const values = []; // look ma, values!
  for (const v of runner) {
    const out = property.run(v) as PreconditionFailure | PropertyFailure | null;
    runner.handleResult(out);
    values.push(v); // mutation is gross, but this is a draft
  }
  // setClassifications is me, added on runnerIterator and
  // classify() is me, I put it on prop but /shrug
  runner.setClassifications(property.classify(values)); // we didn't explode yet
  return runner.runExecution;

And... what we do at the end of the run execution

toRunDetails(/** ... */) {
  // ...

  // Either 'too many skips' or 'interrupted' with flag interruptedAsFailure enabled
  // The two cases are exclusive (the two cannot be true at the same time)
  let failed = this.numSkips > maxSkips || (this.interrupted && this.interruptedAsFailure);
  if (!failed) {
    // here I check the coverage labeling, if any
  }
  // ...
}

AFAIK, this means that this has no involvement with shrinking. We take all source values provided to the property during a test run. At the end of the test run, if nothing exploded (when we generate details) we consult that labeling analysis and then fail the test.

It also means that as of right now I don't know how to express this feature in the case where the property holds. In quickcheck, you'll get label reports for happy path. So... I imagine if that's a feature we wanted for fast-check, we'd just listen for verbosity on the runner (qParams or whatever) and log the labels on success, like statistics does today.

Oct 11 '22 23:10 xtianjohns

Also @dubzzz thanks for that set of commands, I've got tests running now so I'm cracking on fixing those up and adding new cases around this functionality.

Oct 12 '22 01:10 xtianjohns

I haven't seen the way you introduced it in the codebase yet, but a solution could have been to follow the pattern put in place for timeouts.

In other words:

an additional option on assert (as timeout)
an additional wrapper of property to wrap an existing property into one adding this check (as TimeoutProperty)

It would have the benefit to be a code only executed when asked to (opt-in so no performance impact), easier to test and maintain (self sufficient). But the current API of property is probably not enough as you'll probably need a test start and test end or something like that 🤔 Or many not actually and in that case the wrapper trick would make things pretty easy as your wrapper will mostly have to control and track any generated value and throw if probabilities fail. It would also simplify the shrink vs generated triage.

Oct 12 '22 06:10 dubzzz

Right! fc.statistics is just a debugging too to tell you what would be the shape of the values produced by your property or arbitrary if used within fc.assert. It does not run any checks.

Perhaps, all we need is have fc.statistics be an fc.assert decorator so that it gives you both the statistics (e.g. label) and also run any checks.

Nov 23 '22 14:11 moodmosaic

@xtianjohns, interesting piece of work! Perhaps, if we want to add those combinators, we should probably consider adding the others as well. For example, in hedgehog we have:

cover records the number of times a predicate is satisfied and displays the result as a percentage. If the percentage doesn’t meet your threshold then the test fails.

classify works the same as cover but is purely informational and doesn’t have a threshold below which it will fail the test.

label is like classify but doesn’t have a predicate, so it simply tracks the percentage of tests run which hit a certain line of code.

collect is like label but can use sprintf "%A": on its argument to create the label name.

What @EmilTholin mentions is basically cover on steroids; it also runs more tests as needed to ensure the result is statistically significant. In QuickCheck it's called checkCoverage.

And since all the above links point to hedgehog, the equivalent in hedgehog looks kind of like this:

checkCoverage :: Property -> Property
checkCoverage =
  verifiedTermination . withConfidence (10^9)

Nov 23 '22 15:11 moodmosaic

fast-check fast-check copied to clipboard

Statistics shrinked label examples and minimum percentage of labels

fast-check
fast-check copied to clipboard