elm-test icon indicating copy to clipboard operation
elm-test copied to clipboard

Make Fuzz.string generate unicode strings

Open drathier opened this issue 8 years ago โ€ข 6 comments

This implements #201. It also replaces the implementation for #198 #200 (generate whitespace-only strings).

The main difference is that I changed the way we generate strings to make use of equivalence classes:

  1. Choose how many classes of characters to use:
    • single repeated character from a single character class
    • single character class
    • two classes
    • all classes
  2. Pick which character class(es) to use, according to the provided frequency
  3. Generate strings of random length according to the previous selections.

The character classes are:

  • ASCII: same as before, characters in the range [32, 126] (inclusive)
  • Whitespace: same as before, space, \t, \n
  • Emoji: ๐ŸŒˆ, โค, ๐Ÿ”ฅ
  • Combining diacritical marks: ~^ยจ, or rather ฬƒ, ฬ‚, ฬˆ, (without the commas)

I chose these characters since I think they give people who write websites in English for English users a simple reason to support all of unicode, which helps a lot of people. Bilinguals, people whose names use non-ascii characters, and anyone who wants to use Emoji benefits from this. The characters are all familiar to English speakers, and they cover the most important classes in unicode. Emoji require two utf-16 code units, as does most (all?) Asian characters. Combining diacritical mark support covers all of extended Latin, i.e. all of Europe.

The only large character class remaining is inline right-to-left text, but I've left that one out because I have no idea what properties you'd test in such a script that aren't relevant to the rest of unicode. Arabic letters change shape depending on where in a sentance they're used, the byte order doesn't match the order the characters are presented on screen, and it becomes possible to generate invalid strings both in the generation and shrinking phases.


This change builds on the assumption that "abc" and "xyz" are probably going to trigger the same errors, because they share a lot of properties; a-z, lowercase, sorted, unique characters, same length, substring of the English alphabet.

I haven't written a custom shrinker for this subset of unicode, maybe I should do that?

I've previously been unsure if we want to release this right away, or wait for core to release a patch. It's not looking like we're getting a patch until the next major version is released, and 0.19 isn't in alpha yet.

  • Some test suites will break if they don't have the patch from core which fixes String.toList, and those users will have to use a non-core version of that function until core releases a patch, if they have this bug. The String.toList >> String.fromList identity seems to hold both with and without this bug, so it's not going to affect a lot of users.
  • Or we release it right away, because it's a bug in our users' codebases, and the purpose of testing is to find bugs.

drathier avatar Aug 06 '17 13:08 drathier

This should be safe to merge into 0.19 now. Elm doesn't break emojis in 0.19, but the language still doesn't handle reversal of unicode strings properly, it simply reverses the array of unicode code points.

let t = "๐Ÿ˜ป๐Ÿ™€๐Ÿ‘Œx\u{0303}y" in text (t ++ " <=> " ++ String.reverse t)

prints

๐Ÿ˜ป๐Ÿ™€๐Ÿ‘Œxฬƒy <=> แปนx๐Ÿ‘Œ๐Ÿ™€๐Ÿ˜ป

drathier avatar Oct 31 '18 09:10 drathier

I'll hit the approve button, but we should be careful about the release and have a plan to rollback if we break a lot of tests.

mgold avatar Oct 31 '18 15:10 mgold

I'm getting very confused regarding elm-community/elm-test vs elm-explorations/test. Is this repo legacy 0.18, and going to be left as-is from now on, with new development being done in elm-explorations/test? If so, this PR should be merged into elm-explorations/test, not this repo.

drathier avatar Oct 31 '18 19:10 drathier

Oh shoot, I got turned around and thought this was the active repo. It is not.

mgold avatar Nov 01 '18 05:11 mgold

Yeah, I think we should archive this repo.

Any objections?

rtfeldman avatar Nov 01 '18 12:11 rtfeldman

Sure, as soon as we've migrated the PR's

drathier avatar Nov 01 '18 12:11 drathier