datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Add Local Scripts to Reproduce Full CI and Perform Auto-Fixes

Open 2010YOUY01 opened this issue 1 month ago • 3 comments

Is your feature request related to a problem or challenge?

Motivation

AI coding agents are now capable of handling many simple, mechanical tasks in DataFusion. When assigning such tasks, it would be ideal if these agents could verify locally that their changes pass the full CI suite before opening a PR. Currently, the only way to check CI results is to submit a PR and wait for all remote CI jobs to complete, which can take around an hour and slows down iteration.

To improve this workflow, we should provide a simple script that reproduces the entire CI pipeline locally:

./dev/ci.sh # Run full CI locally

Without such a script, AI agents must infer CI behavior from configuration files and may spend unnecessary time/tokens running CI jobs one by one.

Note now we already have a ./dev/rust_lint.sh for all lint related checks, but it's not complete yet, and not include various tests. A full CI reproducing script should be wrap all existing CI test steps into scripts, and use them both in the local CI runner script, and also the GitHub workflow configuration .ymls.

Auto-Fix Script

We should also provide a companion script that performs best-effort automatic fixes:

./auto-fix.sh

This script would handle routine cleanups such as:

  • running cargo fmt
  • generating docs
  • adding Apache headers to newly added files
  • applying safe Clippy auto-fixes
  • any other common mechanical steps

This improves developer experience, and also make AI coding agent iterate faster and spend less token.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

2010YOUY01 avatar Dec 09 '25 08:12 2010YOUY01

take

shifluxxc avatar Dec 10 '25 05:12 shifluxxc

untake

shifluxxc avatar Dec 10 '25 05:12 shifluxxc

./auto-fix.sh would definitely be super useful! I'd wire it up to be a git commit hook.

geoffreyclaude avatar Dec 10 '25 21:12 geoffreyclaude

I’m very much in favor of this

adriangb avatar Dec 20 '25 13:12 adriangb

Literally just bit me: https://github.com/apache/datafusion/actions/runs/20395703139/job/58610984384?pr=19426

adriangb avatar Dec 20 '25 14:12 adriangb