rector-src
rector-src copied to clipboard
Add the possibility to process files in batches
If you are running Rector in CI or if you have several machines available to run it and you have a large code base, you may want to split processing of all your files in several batches, each of which can be run in a different machine. Since Rector processes each file individually and independently of all other files we can easily split the list of files to process.
This PR implements two new command line options batch-index and batch-total that allow you to split the Rector run in several batches. These batches are expected to be run in parallel, not consecutevily.
As a demonstration of how this could work, this PR temporarily adds a new rule to the rector configuration and runs it both as a single run and in parallel batches in CI. The results are similar to this:
As you can see, the total run in a single batch is 1m06s while the run of the batches is between 21s and 41s, so you can save ~25s of time by running this in batches. Probably not worth it for this case but if you have a very large code base where Rector takes 4m to run, you could easily split it in 4 runs of 1:30 minutes or so, saving a good amount of time on the total run
As you can see in the example run, the issues found in the single run match the sum of the issues found in the individual runs
Is this command meant for dry-runs only?
I wonder whether the results will be consistent in case the actions modify the sources and commit them? Would it need rebase and stuff to fast forward etc?
@staabm given that every batch would be working on different files, there should be no conflict between the commits. The only difference with a single run is that you would have several commits instead of a single one
@staabm to test running in no dry-run mode in batches, I created a new branch and added a commit action. I had to use the EndBug/add-and-commit action because the stefanzweifel/git-auto-commit-action action used by Rector does not support pulling before pushing and so is unable to work in parallel.
You can see the run here https://github.com/carlos-granados/rector-src/actions/runs/10930549614
And the added commits here: https://github.com/carlos-granados/rector-src/commits/test-batch-commits/
@TomasVotruba I just rebased this branch. Any interest in this code? Just wanted to know if I need to keep updating it. Cheers!
Thanks for patience, just getting to this now after couple bugs having more priority. I was thinking about similar feature, but it yield more and more complexity to already complex parallel files batching. "Inception matrix fractal" feeling :)
The real solution here is to Github and other CI providers to enable full blown parallel CI runs. The same way we leverage our local machine to get same result faster.
What I'd be open to instead is a meta tool that does this in generic way above the CLI tool - Rector in this case, but also PHPStan, ECS etc. Splitting files to batches should be doable.
Saying that, I want to keep process as simple as we have now and closing this.
Thanks for understanding :pray:
maybe https://github.com/phpstan/phpstan-src/pull/2916 can be a inspiration. in this PR we distribute phpunit test jobs across different parallel github action workers (via job matrix).
see my description what happens in https://github.com/phpstan/phpstan-src/pull/2916#discussion_r1492502527
I think rector could do similar things
This pull request has been automatically locked because it has been closed for 150 days. Please open a new PR if you want to continue the work.