[C++] Consolidate random string generators for use in benchmarks and unittests
This was discussed in here:
https://github.com/apache/arrow/pull/3721
For testing/benchmarking dictionary encoding its useful to control the number of repeated values and it would also be good to optionally include null values. The ability to provide a custom alphabet would be handy for generating strings with unicode characters.
Also note that a simple PRNG should be used as the group has observed performance trouble with Mersenne Twister.
Reporter: Hatem Helal / @hatemhelal
Note: This issue was originally created as ARROW-4661. Please see the migration documentation for further details.
Francois Saint-Jacques / @fsaintjacques:
Feel free to extend RandomArrayGenerator in testing/random.h, I'd love to see the value distribution given as an option struct in the construct instead of having min/max arguments.
Hatem Helal / @hatemhelal: I made a first pass at this to unblock testing for ARROW-3769:
https://github.com/apache/arrow/pull/3721
I'd like to use this issue to work on the option struct as suggested by @fsaintjacques.
This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.