Add Local Scripts to Reproduce Full CI and Perform Auto-Fixes
Is your feature request related to a problem or challenge?
Motivation
AI coding agents are now capable of handling many simple, mechanical tasks in DataFusion. When assigning such tasks, it would be ideal if these agents could verify locally that their changes pass the full CI suite before opening a PR. Currently, the only way to check CI results is to submit a PR and wait for all remote CI jobs to complete, which can take around an hour and slows down iteration.
To improve this workflow, we should provide a simple script that reproduces the entire CI pipeline locally:
./dev/ci.sh # Run full CI locally
Without such a script, AI agents must infer CI behavior from configuration files and may spend unnecessary time/tokens running CI jobs one by one.
Note now we already have a ./dev/rust_lint.sh for all lint related checks, but it's not complete yet, and not include various tests. A full CI reproducing script should be wrap all existing CI test steps into scripts, and use them both in the local CI runner script, and also the GitHub workflow configuration .ymls.
Auto-Fix Script
We should also provide a companion script that performs best-effort automatic fixes:
./auto-fix.sh
This script would handle routine cleanups such as:
- running
cargo fmt - generating docs
- adding Apache headers to newly added files
- applying safe Clippy auto-fixes
- any other common mechanical steps
This improves developer experience, and also make AI coding agent iterate faster and spend less token.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
take
untake
./auto-fix.sh would definitely be super useful! I'd wire it up to be a git commit hook.
I’m very much in favor of this
Literally just bit me: https://github.com/apache/datafusion/actions/runs/20395703139/job/58610984384?pr=19426