elm-test
elm-test copied to clipboard
Make Fuzz.string generate unicode strings
This implements #201. It also replaces the implementation for #198 #200 (generate whitespace-only strings).
The main difference is that I changed the way we generate strings to make use of equivalence classes:
- Choose how many classes of characters to use:
- single repeated character from a single character class
- single character class
- two classes
- all classes
- Pick which character class(es) to use, according to the provided frequency
- Generate strings of random length according to the previous selections.
The character classes are:
- ASCII: same as before, characters in the range
[32, 126](inclusive) - Whitespace: same as before,
space, \t, \n - Emoji:
๐, โค, ๐ฅ - Combining diacritical marks:
~^ยจ, or ratherฬ, ฬ, ฬ,(without the commas)
I chose these characters since I think they give people who write websites in English for English users a simple reason to support all of unicode, which helps a lot of people. Bilinguals, people whose names use non-ascii characters, and anyone who wants to use Emoji benefits from this. The characters are all familiar to English speakers, and they cover the most important classes in unicode. Emoji require two utf-16 code units, as does most (all?) Asian characters. Combining diacritical mark support covers all of extended Latin, i.e. all of Europe.
The only large character class remaining is inline right-to-left text, but I've left that one out because I have no idea what properties you'd test in such a script that aren't relevant to the rest of unicode. Arabic letters change shape depending on where in a sentance they're used, the byte order doesn't match the order the characters are presented on screen, and it becomes possible to generate invalid strings both in the generation and shrinking phases.
This change builds on the assumption that "abc" and "xyz" are probably going to trigger the same errors, because they share a lot of properties; a-z, lowercase, sorted, unique characters, same length, substring of the English alphabet.
I haven't written a custom shrinker for this subset of unicode, maybe I should do that?
I've previously been unsure if we want to release this right away, or wait for core to release a patch. It's not looking like we're getting a patch until the next major version is released, and 0.19 isn't in alpha yet.
- Some test suites will break if they don't have the patch from core which fixes
String.toList, and those users will have to use a non-core version of that function until core releases a patch, if they have this bug. TheString.toList >> String.fromListidentity seems to hold both with and without this bug, so it's not going to affect a lot of users. - Or we release it right away, because it's a bug in our users' codebases, and the purpose of testing is to find bugs.
This should be safe to merge into 0.19 now. Elm doesn't break emojis in 0.19, but the language still doesn't handle reversal of unicode strings properly, it simply reverses the array of unicode code points.
let t = "๐ป๐๐x\u{0303}y" in text (t ++ " <=> " ++ String.reverse t)
prints
๐ป๐๐xฬy <=> แปนx๐๐๐ป
I'll hit the approve button, but we should be careful about the release and have a plan to rollback if we break a lot of tests.
I'm getting very confused regarding elm-community/elm-test vs elm-explorations/test. Is this repo legacy 0.18, and going to be left as-is from now on, with new development being done in elm-explorations/test? If so, this PR should be merged into elm-explorations/test, not this repo.
Oh shoot, I got turned around and thought this was the active repo. It is not.
Yeah, I think we should archive this repo.
Any objections?
Sure, as soon as we've migrated the PR's