arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++] Consolidate random string generators for use in benchmarks and unittests

Open asfimport opened this issue 6 years ago • 3 comments

This was discussed in here:

https://github.com/apache/arrow/pull/3721

For testing/benchmarking dictionary encoding its useful to control the number of repeated values and it would also be good to optionally include null values.  The ability to provide a custom alphabet would be handy for generating strings with unicode characters.

 

Also note that a simple PRNG should be used as the group has observed performance trouble with Mersenne Twister.

Reporter: Hatem Helal / @hatemhelal

Note: This issue was originally created as ARROW-4661. Please see the migration documentation for further details.

asfimport avatar Feb 22 '19 12:02 asfimport

Francois Saint-Jacques / @fsaintjacques: Feel free to extend RandomArrayGenerator in testing/random.h, I'd love to see the value distribution given as an option struct in the construct instead of having min/max arguments.

asfimport avatar Feb 22 '19 15:02 asfimport

Hatem Helal / @hatemhelal: I made a first pass at this to unblock testing for ARROW-3769:

https://github.com/apache/arrow/pull/3721

I'd like to use this issue to work on the option struct as suggested by @fsaintjacques.

asfimport avatar Mar 13 '19 14:03 asfimport

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

github-actions[bot] avatar Dec 13 '25 11:12 github-actions[bot]