rapidcheck icon indicating copy to clipboard operation
rapidcheck copied to clipboard

Add support for generating unicode strings

Open emil-e opened this issue 9 years ago • 4 comments

emil-e avatar Mar 10 '16 12:03 emil-e

I might implement this myself as I ran into the need for it just recently. I'm guessing generators for different encodings would be desired (UTF8, UTF16 and UTF32?) and possibly the option of specifying desired ranges.

P-Andersson avatar Mar 11 '16 16:03 P-Andersson

UTF8 is obviously the most important since that is likely what goes into std::string. Specifying ranges would also be nice but for arbitrary the range should be based on size.

If you want to take a stab at this, you can look at Text.h for inspiration since the naive way of using gen::container likely be wasteful of random bits thus giving bad performance. A PR with tests and docs will be met with open arms :)

emil-e avatar Mar 11 '16 18:03 emil-e

I've been thinking how to best go about shrinking UTF8 characters, was thinking to remove one byte from it at each iteration and when only one byte is left to the regular character shrinking...

P-Andersson avatar Mar 13 '16 13:03 P-Andersson

If you generate code points that you map to the respective encoding, you can just shrink those and the bytes in the UTF8 characters will become fewer automatically. UTF32 will probably serve as a good base given that it's essentially unencoded code points and the other ones can simply be mappings of that one.

Just like with the regular string generator, you might want to shrink characters down to some simple ones first, like abc.

emil-e avatar Mar 13 '16 16:03 emil-e