datafusion-comet
datafusion-comet copied to clipboard
Benchmark automation
What is the problem the feature request solves?
I have been spending significant time manually running benchmarks, both during development and when preparing to release Comet. I have been using a mix of open-source scripts and custom scripts. I want other contributors to be able to run the same benchmarks, both locally and in the cloud.
I recently documented how to run benchmarks in AWS towards this goal.
My goals for this issue are:
- Ensure that all scripts, configs, and documentation are located in the datafusion-comet repo so that anyone can run the same benchmarks by just cloning the repo and running a script
- Provide recommended configs for different environments and scale factors
- Have some subset of benchmarks run nightly against the main branch, with the results posted somewhere for later analysis and reporting (this could be as simple as pushing to a GitHub repo)
Describe the potential solution
No response
Additional context
No response
I made some github actions to benchmark native engines.
- compare spark/comet/gluten engines: https://github.com/wForget/benchmarks-spark-native/actions/workflows/master.yml
- compare different commits of comet: https://github.com/wForget/benchmarks-spark-native/actions/workflows/benchmark-comet-change.yml