rapidcheck
rapidcheck copied to clipboard
Added unicode generators
No documentation yet, though the implementation is tested and supports utf8 I'm looking for some feedback regarding a few issues.
- Currently, the fundamental part here is the Unicode Codepoint, an integer which is identical to utf32 encoding. I'm not sure if it is a good idea to keep these concepts separate from each other or not.
- The algorithm used to generate the codepoint heavily favors the low end of the possible range of values (the ASCII range) though higher values still come up a significant amount of the time. The reasoning is that characters that are likely to be specially treated (newlines, spaces, tabs...) occur there. This also gives a better distribution of byte sizes for utf8 than a pure uniform random distribution of codepoint values would.
- Currently, it only generates valid unicode values (though still values that has been assigned no symbols yet). I'm not sure if invalid ones are interesting to create.
Nice! I will review this as soon as I have the time, have been a little busy lately :)
Will continue review later.