elm-test icon indicating copy to clipboard operation
elm-test copied to clipboard

Consider removing support for `andThen`

Open jwoudenberg opened this issue 8 years ago • 45 comments

Although I really like the andThen conceptually, in practice I run into trouble when using it more often than not (#105, ~#132~, #160). I think it's very easy to come at it with the expectation that it works the same way as andThen for generators, but find that using it actually requires a fair bit of knowledge about the shrinking behaviour. This seems a bit treacherous for users of the library, most of whom might hardly be aware of shrinking or consider it an implementation detail.

Most cases where I used andThen could be refactored not to rely on it. For those instances where andThen is really needed, I'd argue using custom might be the better choice because it forces the user to think about the shrinking behaviour they want. It's very hard to guess the correct let alone efficient shrinking behaviour for the general case. Consider the fuzzer in the (artificial) example below doing way too much work:

{-| Expected: Equivalent to `bool`. Actual: Int shrinking is driving the shrinking of a bool.
-}
dumbBool : Fuzzer Bool
dumbBool =
  int |> andThen (always bool)

Lastly, a few possible feature and performance improvements of fuzzers in general currently seem to be quasi-blocked on the complexity of the fuzzer codebase (#109, #148). andThen seems to be by far the most complex part of the Fuzzer codebase at the moment. Removing it might allow us to simplify a bit, empowering us to tackle some of those ideas.

jwoudenberg avatar May 24 '17 08:05 jwoudenberg

This is a compelling argument to me.

@mgold thoughts?

rtfeldman avatar May 24 '17 08:05 rtfeldman

Yeah, that sounds about right.

I'm not sure how much simplification this would actually enable without regressing the shrinking behavior of map and mapN.

mgold avatar May 25 '17 02:05 mgold

Excellent! I'm working on an experiment in #162 (another potential avenue for simplification if it doesn't impact performance). I'll report back when I have some code to show.

jwoudenberg avatar May 25 '17 12:05 jwoudenberg

Maybe this has been seen and considered already, but hedgehog is a haskell library that tried to address common difficulties with quickcheck generators: http://teh.id.au/posts/2017/04/23/property-testing-with-hedgehog/

Also, interesting comments from the QuickCheck author: https://www.reddit.com/r/haskell/comments/646k3d/ann_hedgehog_property_testing/dg1485c/

avh4 avatar Jun 01 '17 23:06 avh4

It seems like Hedgehog's main selling point is that it does what we already do 😄

In Hedgehog, a reasonably good shrink function is constructed for free alongside every generator. Shrink functions are coupled with every generator, and a lazy tree of all possible shrinks is constructed alongside each generated value. Shrinking is then little more than traversing the tree. This is Hedgehog's theoretical point of difference from QuickCheck, though it also differs in a few practical ways.

rtfeldman avatar Jun 02 '17 01:06 rtfeldman

This part of Hedgehog is interesting:

While Range helps us construct generators in terms of bounds and steps, we may also wish to write generators in terms of negative invariants. It may be simpler to generate terms, apply some predicates to them, and discard the invalid ones. This gives us the flexibility to write much more complicated generators, though they may take longer to run, or diverge.

Generators that discard heavily are certain to be less productive than other constructive generators. They can generate an arbitrary number of invalid values for each valid one, at the expense of your CPU cycles and precious seconds you probably needed for your life. So, this is a tool to be used gently and infrequently.

Seems like they're essentially talking about Fuzz.filter here.

rtfeldman avatar Jun 02 '17 01:06 rtfeldman

Yes, that sounds correct.

Is the major point here that hedgehog is not monadic / does not support andThen and therefore we have grounds to deprecate it?

I wonder what we can learn from its strategies for generating and shrinking, e.g. advise us on #168.

mgold avatar Jun 02 '17 05:06 mgold

Would not a slightly simpler solution be to keep andThen and simply make it not shrink? Seems like that would bring all the proposed benefits, but still keep the convenient API?

gampleman avatar Jun 02 '17 09:06 gampleman

Shrinking is important. You can't get rid of shrinking in general, and it's hard to know what shrinks are good, in general. For example. Fuzz.frequency should shrink by shrinking the chosen fuzzer, not picking a different one. But maybe someone else wants something else.

mgold avatar Jun 02 '17 15:06 mgold

Yeah, I agree. I turned off shrinking once to address a slow-to-shrink failure back in the elm-check days and the output was so horrendous I ended up rewriting the test.

If we're shipping something that doesn't shrink, we're probably shipping a footgun that we'll later recommend against using. 😅

rtfeldman avatar Jun 02 '17 16:06 rtfeldman

The one place we use Fuzz.andThen in our tests is in a construct like this:

let
    makeListOfBigRecords : Int -> Fuzzer (List BigRecord)
    makeListOfBigRecords items =
        0 ->
            Fuzz.constant []

        1 ->
            Fuzz.map (\thing1 -> [ thing1 ]) (bigRecordFuzzer 0)

        2 ->
            Fuzz.map (\thing1 thing2 -> [ thing1, thing2 ]) (bigRecordFuzzer 0) (bigRecordFuzzer 1)

        -- map3 and map4 respectively.

        _ ->
            Debug.crash "Too many things"
in
    Fuzz.intRange 0 4
        |> Fuzz.andThen makeListOfBigRecords
        |> Fuzz.map Array.fromList

The tests processing those big records get really, really slow when the array is long. This only seemed to happen when Fuzz.list generated really long lists. Our app logic only cares if the length is ≧ 3 (but can technically handle longer lists), so I solved the issue by feeding a length argument into a helper function.

In order to quit using Fuzz.andThen while keeping our tests relatively fast, we need a way to specify a maximum size on lists. The argument to bigRecordFuzzer can be handled with Fuzz.map and List.indexedMap.

cassiemeharry avatar Jun 02 '17 17:06 cassiemeharry

That's a good usecase @nickmeharry, thanks for sharing. There's a discussion in #168 about list Fuzzers perhaps generating fewer elements. Also, it's possible to create a shortListFuzzer without relying on andThen, although it's not as flexible in the maximum size of lists it generates as yours.

shortList : Fuzz.Fuzzer a -> Fuzz.Fuzzer (List a)
shortList itemFuzzer =
    Fuzz.frequency
        [ (1, Fuzz.constant [])
        , (1, Fuzz.map (\a -> [ a ]) itemFuzzer)
        , (1, Fuzz.map2 (\a b -> [ a, b ]) itemFuzzer itemFuzzer)
        ]

@BrianHicks showed me another use case for andThen: creating a lazy fuzzer. See here. It might make sense adding a function like that directly to elm-test.

jwoudenberg avatar Jun 02 '17 18:06 jwoudenberg

How would you all feel if I sent around a short survey asking for applications people have for andThen, to take inventory of what it would take to make removing it acceptable for people?

jwoudenberg avatar Jun 15 '17 03:06 jwoudenberg

Sounds good to me!

On Wed, Jun 14, 2017, 8:44 PM Jasper Woudenberg [email protected] wrote:

How would you all feel if I sent around a short survey asking for applications people have for andThen, to take inventory of what it would take to make removing it acceptable for people?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/elm-community/elm-test/issues/161#issuecomment-308620414, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCxwIbotXnKpn6vBeOpyLPhhugZrIfIks5sEKi0gaJpZM4NkyHf .

rtfeldman avatar Jun 15 '17 03:06 rtfeldman

My usecase for andThen is for the "update" tests, when I want the Msg fuzzers not to be standalone but to depend on previous model (see branch fuzzers-dependent-on-model).

  1. I fuzz the initial Model
  2. andThen apply it to Model -> Fuzzer Msg function to get a Msg fuzzer
  3. map2 them together to get a new Model
  4. repeat 2. and 3. until some stopping condition is met.

The problem with this is the shrinking (when the fuzz tests pass, well they just pass. When they fail, the shrinking hogs the CPU and never ends).

What would be a proposed new API to do something similar? custom? (ATM I don't see how that would work)

Janiczek avatar Jun 16 '17 07:06 Janiczek

@Janiczek nice use case! Yes, the idea is that custom provides the basic building block to implement tools for any specific use case. It takes a generator and a shrinker, of which the generator should be simple to implement because you can compose it in the same way you're currently composing your Fuzzer (generators support andThen). Building a custom shrinker is harder and it sounds like that's exactly what you need, considering the default shrinker isn't working for you.

jwoudenberg avatar Jun 16 '17 23:06 jwoudenberg

I made a survey! https://docs.google.com/forms/d/e/1FAIpQLSebuE0i4KXChLN4nstmH720UeQ3jgF-PRlCg-_eNj0Nv7i_Yg/viewform?usp=sf_link

I'll distribute it among Elm channels and report back the results.

jwoudenberg avatar Jun 16 '17 23:06 jwoudenberg

@jwoudenberg for future reference, how long do you plan to let the survey run?

rtfeldman avatar Jun 24 '17 03:06 rtfeldman

@rtfeldman I plan on closing it this weekend. I got four responses so far that didn't really unearth anything new. I'll write up a summary and a proposal for next steps later this weekend.

jwoudenberg avatar Jun 24 '17 18:06 jwoudenberg

Sounds good!

On Sat, Jun 24, 2017, 11:27 AM Jasper Woudenberg [email protected] wrote:

@rtfeldman https://github.com/rtfeldman I plan on closing it this weekend. I got four responses so far that didn't really unearth anything new. I'll write up a summary and a proposal for next steps later this weekend.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/elm-community/elm-test/issues/161#issuecomment-310856474, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCxwENUgiNeUpegK_ptlaFGZI97vLyRks5sHVTzgaJpZM4NkyHf .

rtfeldman avatar Jun 24 '17 21:06 rtfeldman

Here's a summary of all the use cases that came up in the survey, which got 4 responses:

  1. Recursively fuzzing messages based on the state of the model (three people mentioned this).
  2. Implementation of lazy : (() -> Fuzzer a) -> Fuzzer a to fuzz recursive data structures.
  3. Creating a fuzzer for short lists.
  4. Generate a random ID and use it to fuzz one of the elements passed to Fuzz.frequency.

Some thoughts on how we could address this:

Recursively fuzzing messages based on the state of the model

Evan had a nice idea for an API that could address this in a neat way. I tried it out here. @Janiczek and @BrianHicks, would something like that be a good substitute for fuzzing Msg's for your use cases?

Implementation of lazy to fuzz recursive data structures.

It should be pretty easy to implement lazy directly in elm-test. Unfortunately lazy has the same disadvantage as andThen when it comes to #160: Figuring out whether a fuzzer is valid or not before running it. Does anyone have any nice ideas for working around that?

Creating a fuzzer for short lists

We have a workaround using mapN for this. A nicer solution is being considered here. Would that work for you @nickmeharry?

Generate a random ID and use it to fuzz one of the elements passed to Fuzz.frequency

This sounds like something that could be achieved with mapN too. I've reached out to this person to get some details.

jwoudenberg avatar Jun 25 '17 12:06 jwoudenberg

The randomWalk idea does not allow for the following case: using the model to create and Html msg, and then using the Html msg to produce a list of possible msg that can be produced by the current UI, and then choosing one of those to perform next. Though maybe that sort of test is not appropriate for fuzz testing, since there may not be a sensible, general way to shrink that.

avh4 avatar Jun 25 '17 17:06 avh4

@jwoudenberg Wow, the randomWalk idea seems really nice! From a read-through, I wonder about hardcoding Msgs into these tests (ie. Input "tom" in the docs). Does that limit what bugs the fuzz test can find? (I'll probably try to port my examples to use this lib and see what happens :) )

Janiczek avatar Jun 25 '17 21:06 Janiczek

@avh4 You mean grab the messages that can be generated from the Html fully automatically? I didn't know that was possible, cool!

I like the ease of use of that scenario, it would take very little work to get a test up and running. I also have doubts about what the quality of those tests would be in the general case. Consider a rendered page with two buttons, once of which logs the user out and the other of which proceeds the user to a page with lots of complexity. 50 % of fuzz runs would end up testing the trivial scenario of logging out and the problem compounds with every extra step the test makes.

So for longer series of Msg's, I think some guidance is needed to ensure test time is spent well, but I like the idea as a single shot test: I'm on this page, I click any button, then X should hold for my model.

@Janiczek Credit goes to Evan!

You raise an interesting point with regards to the hardcoded messages. I don't really have a clear opinion on in yet. First things that come to mind: We could look incorporating randomness into the API for the messages, although that would probably complicate the API. On the other hand, perhaps it's nice to gently nudge away from that much detail in what's a pretty high-level test. For testing specific complicated transitions, it's always possible fuzz a single function. Or, if you want to, to first set up a model at a specific state (optionally using randomWalk to get there), then fuzz a single message in as much detail was you want and append that to the end of the message list returned by randomWalk using map2. Anyway, food for thought :).

jwoudenberg avatar Jun 25 '17 21:06 jwoudenberg

@jwoudenberg Yes, the frequency based method should work just as well for me. In hindsight, my approach was basically emulating frequency, but not as well.

cassiemeharry avatar Jun 26 '17 17:06 cassiemeharry

@jwoudenberg care to finish the bullet in your comment? :)

Janiczek avatar Jun 26 '17 21:06 Janiczek

@Janiczek Haha, that entire list was a previous draft of the paragraph above. I deleted the bunch. Sorry for the mess :).

jwoudenberg avatar Jun 26 '17 21:06 jwoudenberg

I've thought a bit more about the lazy case and don't really see a way we can have our cake and eat it too. It seems to me we have to choose:

  1. We implement lazy at the cost that we cannot statistically analyse fuzzers. This means it cannot in general be determined whether a fuzzer is invalid (because you never know what that lazy is going to do, currently it's possible to cause a Debug.crash this way). Analysing a fuzzer to find problems like double-nested lists and doing something about them will also become much harder.
  2. We do not implement lazy at the cost of not being able to easily express recursive data structures. To do something recursive-like a user will need to pass down a decreasing counter to ensure the recursion stops at some point. @BrianHicks I'd love to hear if this would be workable for your usecase.

I'm leaning towards option two.

jwoudenberg avatar Jun 30 '17 20:06 jwoudenberg

Limiting recursion actually sounds preferable to me. lazy is just a way to get around the defined-in-terms-of-itself limitation. I'm doing odd things with frequency to make sure my fuzzers don't get too deep; they run very slow and crash if I don't. A first-class solution to recursive data structures would be helpful!

BrianHicks avatar Jun 30 '17 20:06 BrianHicks

Seconding Brian. It's too easy to write a fuzzer that makes an infinite tree. You have to do things like

if depth > 2 then
    ( 0, Fuzz.constant (Point ( 1, 2, 0 )) )
else
    ( 1 / depth, Fuzz.map GeometryCollection (Fuzz.list (helper (depth + 1))) )

(source)

mgold avatar Jul 01 '17 01:07 mgold