Benchmark automation

Open andygrove opened this issue 8 months ago • 1 comments

What is the problem the feature request solves?

I have been spending significant time manually running benchmarks, both during development and when preparing to release Comet. I have been using a mix of open-source scripts and custom scripts. I want other contributors to be able to run the same benchmarks, both locally and in the cloud.

I recently documented how to run benchmarks in AWS towards this goal.

My goals for this issue are:

Ensure that all scripts, configs, and documentation are located in the datafusion-comet repo so that anyone can run the same benchmarks by just cloning the repo and running a script
Provide recommended configs for different environments and scale factors
Have some subset of benchmarks run nightly against the main branch, with the results posted somewhere for later analysis and reporting (this could be as simple as pushing to a GitHub repo)

Describe the potential solution

No response

Additional context

No response

Apr 10 '25 16:04 andygrove

I made some github actions to benchmark native engines.

compare spark/comet/gluten engines: https://github.com/wForget/benchmarks-spark-native/actions/workflows/master.yml
compare different commits of comet: https://github.com/wForget/benchmarks-spark-native/actions/workflows/benchmark-comet-change.yml

Apr 28 '25 13:04 wForget