Add support for clickbench data and benchmark with page index
Is your feature request related to a problem or challenge?
Currently, our clickbench benchmark data don't have page index, this ticket will add page index data generator, also add a separate benchmark to support the clickbench with page index.
And may be expose more custom options? Such as page index option, compression option, sort option to generate the data set based old clickbench data set.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
take
- https://github.com/apache/datafusion/issues/16200
Will depend on this ticket.
Just a thought: do we need an artificial dataset to really highlight the problem / solution? I think it's unlikely to be measurable with a dataset that has 25 columns and 500 row groups, especially if we're talking about avoiding parsing but not even avoiding IO. My guess is if you make a dataset with 10k columns and 1000s of row groups we'll see a difference.
Thank you @adriangb for this good point, i agree with you, and why i create this jira because we also can use it to mock more custom data based current clickbench, and we can use it for more options.
Just a thought: do we need an artificial dataset to really highlight the problem / solution? I think it's unlikely to be measurable with a dataset that has 25 columns and 500 row groups, especially if we're talking about avoiding parsing but not even avoiding IO. My guess is if you make a dataset with 10k columns and 1000s of row groups we'll see a difference.