[GLUTEN-7641][VL] Add Gluten benchmark scripts
https://github.com/apache/incubator-gluten/issues/7641
Thank you!
BTW there were a couple of related efforts in our code base (not all of them):
https://github.com/apache/incubator-gluten/pull/432 https://github.com/apache/incubator-gluten/pull/5278
Should we review them then remove the unnecessary / unmaintained ones? If they are still needed, I think we can create a new directory like examples to centralize them.
Why there are 3 TPCDS queries set? Can we consolidate to one?
./tools/gluten-it/common/src/main/resources/tpcds-queries ./gluten-core/src/test/resources/tpcds-queries ./gluten-core/target/scala-2.12/test-classes/tpcds-queries
Thank you!
BTW there were a couple of related efforts in our code base (not all of them):
#432 #5278
Should we review them then remove the unnecessary / unmaintained ones? If they are still needed, I think we can create a new directory like
examplesto centralize them.
We may put it under tools/workload, name it as benchmark_velox since the script only support Velox.
Why there are 3 TPCDS queries set? Can we consolidate to one?
./tools/gluten-it/common/src/main/resources/tpcds-queries ./gluten-core/src/test/resources/tpcds-queries ./gluten-core/target/scala-2.12/test-classes/tpcds-queries
@FelixYBW ./gluten-core/target/scala-2.12/test-classes/tpcds-queries is generated by maven compile time. It's not in the code base.
./tools/gluten-it/common/src/main/resources/tpcds-queries is the one used by GHA and notebook scripts
./gluten-core/src/test/resources/tpcds-queries Not sure if this one is used by any Gluten UT. I will double check. If not, we can remove it.
@FelixYBW Opened https://github.com/apache/incubator-gluten/pull/7666 for some removals.
backends-velox/src/test/resources/tpch-queries-velox should also be removed. I will open another PR to remove them.
initialize.ipynb. Let's remove the BKM section
Looks good. Let's test on cloud once we have a chance.