datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Benchmark automation

Open andygrove opened this issue 8 months ago • 1 comments

What is the problem the feature request solves?

I have been spending significant time manually running benchmarks, both during development and when preparing to release Comet. I have been using a mix of open-source scripts and custom scripts. I want other contributors to be able to run the same benchmarks, both locally and in the cloud.

I recently documented how to run benchmarks in AWS towards this goal.

My goals for this issue are:

  1. Ensure that all scripts, configs, and documentation are located in the datafusion-comet repo so that anyone can run the same benchmarks by just cloning the repo and running a script
  2. Provide recommended configs for different environments and scale factors
  3. Have some subset of benchmarks run nightly against the main branch, with the results posted somewhere for later analysis and reporting (this could be as simple as pushing to a GitHub repo)

Describe the potential solution

No response

Additional context

No response

andygrove avatar Apr 10 '25 16:04 andygrove

I made some github actions to benchmark native engines.

  • compare spark/comet/gluten engines: https://github.com/wForget/benchmarks-spark-native/actions/workflows/master.yml
  • compare different commits of comet: https://github.com/wForget/benchmarks-spark-native/actions/workflows/benchmark-comet-change.yml

wForget avatar Apr 28 '25 13:04 wForget