elm-minithesis icon indicating copy to clipboard operation
elm-minithesis copied to clipboard

opinion: char/string fuzzers could be improved

Open BrianHicks opened this issue 5 years ago • 2 comments

Right now there is:

char : Fuzzer Char
charRange : Int -> Int -> Fuzzer Char
anyChar : Fuzzer Char

string : Fuzzer String
stringOfLength : Int -> Fuzzer String
stringWith : { minLength : Maybe Int, maxLength : Maybe Int, customAverageLength : Maybe Float, charFuzzer : Fuzzer Char } -> Fuzzer String

But I've never actually found the string fuzzers very useful! They're very so far out of my domain that it's not worth the combinatorial overhead to generate them. So I wonder about this API instead, which would make char match how int works now:

char : Char -> Char -> Fuzzer Char -- not Ints! It's easy to accept literal values and it makes tests easier to read
humanReadableChar : Fuzzer Char -- existing plus accented characters and maybe a few unicode/CJK characters

as for string itself… maybe we could drop it! I have always gotten better test results by generating some data structure I care about and then serializing it, or picking from a small set of known data. If we want realistic edge case strings for fuzzing, we could consider generating Unicode data. (e.g. Unicode publishes test tables)

But I know that dropping string is probably pretty controversial! So if you don't like that, how about:

string : Fuzzer Char -> Fuzzer String
stringOfLength : Fuzzer Int -> Fuzzer Char -> Fuzzer String

then

string humanReadableChar

stringOfLength (int 1 20) (char 'a' 'Z')

wanting strings to average around a given length feels sufficiently unusual that I would think something like this could solve it better:

fromGenerator : Random.Generator a -> Fuzzer a

-- usage

stringOfLength (fromGenerator (Random.Float.normal 100 10)) humanReadableChar

BrianHicks avatar Jul 30 '20 14:07 BrianHicks

I like how

string : Fuzzer Char -> Fuzzer String
stringOfLength : Fuzzer Int -> Fuzzer Char -> Fuzzer String

gets rid of the stringWith "go full bonkers on options record" approach. You could say we could simplify with stringOfLength : Int -> ... since you could just

int 0 10
|> andThen (\length -> stringOfLength length myCharFuzzer)

but this has consequences for the shrinking (ie. length |> andThen ... has different shrinking characteristics than the approach given by throw a coin -> generate item -> throw a coin -> generate item -> ... -- that tends to shrink better).

I think we'll need to keep the stringWith { ... } at least internally, to use the listWith { ... } helper, and somehow get from our surface API down to that level.

EDIT: I guess that because shrinking performance is dependent on implementation of the fuzzers, we want to give be a bit more batteries-included than elm-test is, to minimize the chance that users write a fuzzer that shrinks badly)


Unrelatedly to the above, more towards the character fuzzer:

For inspiration: Hypothesis lets users choose Unicode categories from which characters should be drawn; this gets "compiled" down to an union of integer intervals.

Some links for future reference:

  • docs for characters: https://hypothesis.readthedocs.io/en/latest/data.html?highlight=floats#hypothesis.strategies.characters
  • docs for strings: https://hypothesis.readthedocs.io/en/latest/data.html?highlight=floats#hypothesis.strategies.text
  • source: https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/src/hypothesis/strategies/_internal/strings.py
  • some tests:
    • https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/tests/cover/test_simple_strings.py
    • https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/tests/cover/test_simple_characters.py

Janiczek avatar Jul 31 '20 16:07 Janiczek

Yeah, getting rid of stringOfLength also has to balance what's actually the most common use case here. If people use strings with fixed lengths a lot, it's probably better to expose a fuzzer rather than throwing people over to the combinators regardless of internals.

It also seems fine to simplify lists. I didn't think of this way to put it yesterday, but it feels like some of these constructors are pre-baked combinators. Examining how we'd do those tasks with only andThen and friends might result in some nice simplifications!

BrianHicks avatar Jul 31 '20 16:07 BrianHicks