rushstack icon indicating copy to clipboard operation
rushstack copied to clipboard

[rush] add support for sharding phases

Open aramissennyeydd opened this issue 1 year ago • 3 comments

Summary

This PR adds support for sharding to rush phases. This allows plugins that support sharding, like jest, to be split into multiple shards and run independently. It does this by adding a new set of options to rush-project.json under a new sharding key. example:

{
    "operationSettings": {
        {
            "operationName": "_phase:test",
            "outputFolderNames": ["coverage", "temp/coverage"],
            "sharding": {
                "count": 6,
                // Defaults to `--shard={shardIndex}/{shardCount}`
                "shardArgumentFormat": "--shard-format={shardIndex}-{shardCount}"
            }
        }
    }
}

Details

This is the initial chunk of work to support sharding in the operation graph. It includes both the sharding nodes as well as a collator node that can run a script after all of the shard nodes are complete.

I originally attempted this with heft plugins, however tying into rush parallelism + cobuilds is one of our end goals with this work and heft plugins at the moment don't allow that.

How it was tested

I've been locally testing with node apps/rush/lib/start-dev test --to heft-jest-shards-test -p 6 and varying parallelism flags. The tests in heft-jest-shards-test run for 10 seconds and then pass and there are 6 of those files, so -p 6 should run in ~10 seconds, -p 3 in 20 seconds and so on. I also added a sharded-repo to the existing cobuild suite to ensure this works with cobuilds.

TODO:

  • [x] retest cobuilds after the log filename is updated.

Impacted documentation

  • https://rushjs.io/pages/configs/rush-project_json/
  • Not sure where else this fits in? Should it get its own page under "Maintainer tutorials"?

aramissennyeydd avatar Apr 16 '24 17:04 aramissennyeydd

Also, I think CI is failing due to mismatched versions of rush - the sharding option isn't available in the version of install-run-rush.

aramissennyeydd avatar Apr 26 '24 19:04 aramissennyeydd

@dmichon-msft I trimmed out the heft changes - this should be good for another 👀

aramissennyeydd avatar May 01 '24 21:05 aramissennyeydd

@iclanton @dmichon-msft Are we ready to merge this?

octogonz avatar May 14 '24 02:05 octogonz

The conversations that I've left unresolved above I have open questions around.

Current state of things,

  1. Shards are spliced into the current operation graph, with a pre-shard node that does nothing, a set of N shard nodes that do the sharding work (_phase:${name}:shard) and a single collate node that runs work over multiple shards (_phase:${name})
  2. Both shard + collate operations use the overall phase missingScriptBehavior configuration.
  3. Operation weighting works with sharding as well, in the below timeline b shards have weight=10 and a shards have weight 4. Parallelism is set to 10. As expected, only 1 b build is picked up at a time, and 3 a builds are picked up at once.
b (build) - shard 15/15 ###----------------------------------------------------------------------------- 2.1s
b (build) - shard 14/15 --####-------------------------------------------------------------------------- 2.1s
b (build) - shard 13/15 -----###------------------------------------------------------------------------ 2.1s
b (build) - shard 12/15 -------####--------------------------------------------------------------------- 2.1s
b (build) - shard 11/15 ----------####------------------------------------------------------------------ 2.1s
b (build) - shard 10/15 -------------###---------------------------------------------------------------- 2.1s
 b (build) - shard 9/15 ---------------####------------------------------------------------------------- 2.2s
 b (build) - shard 8/15 ------------------####---------------------------------------------------------- 2.2s
 b (build) - shard 7/15 ---------------------###-------------------------------------------------------- 2.2s
 b (build) - shard 6/15 -----------------------####----------------------------------------------------- 2.2s
 b (build) - shard 5/15 --------------------------####-------------------------------------------------- 2.2s
 b (build) - shard 4/15 -----------------------------###------------------------------------------------ 2.2s
 b (build) - shard 3/15 -------------------------------####--------------------------------------------- 2.2s
 b (build) - shard 2/15 ----------------------------------####------------------------------------------ 2.2s
 b (build) - shard 1/15 -------------------------------------###---------------------------------------- 2.1s
    b (build) - collate ---------------------------------------##--------------------------------------- 0.7s
  a (build) - shard 3/3 ---------------------------------------####------------------------------------- 2.2s
  a (build) - shard 2/3 ---------------------------------------####------------------------------------- 2.2s
  a (build) - shard 1/3 ---------------------------------------####------------------------------------- 2.1s
  1. I adjusted the collate script to also use CLI parameters, --shard-parent-folder and --shard-count.
  2. I also verified that the new build + collate scripts work as expected, example collate output below,
Hello world! b --shard=1/15 --output-directory=.rush/operations/_phase_build/shards/1
Hello world! b --shard=2/15 --output-directory=.rush/operations/_phase_build/shards/2
Hello world! b --shard=3/15 --output-directory=.rush/operations/_phase_build/shards/3
Hello world! b --shard=4/15 --output-directory=.rush/operations/_phase_build/shards/4
Hello world! b --shard=5/15 --output-directory=.rush/operations/_phase_build/shards/5
Hello world! b --shard=6/15 --output-directory=.rush/operations/_phase_build/shards/6
Hello world! b --shard=7/15 --output-directory=.rush/operations/_phase_build/shards/7
Hello world! b --shard=8/15 --output-directory=.rush/operations/_phase_build/shards/8
Hello world! b --shard=9/15 --output-directory=.rush/operations/_phase_build/shards/9
Hello world! b --shard=10/15 --output-directory=.rush/operations/_phase_build/shards/10
Hello world! b --shard=11/15 --output-directory=.rush/operations/_phase_build/shards/11
Hello world! b --shard=12/15 --output-directory=.rush/operations/_phase_build/shards/12
Hello world! b --shard=13/15 --output-directory=.rush/operations/_phase_build/shards/13
Hello world! b --shard=14/15 --output-directory=.rush/operations/_phase_build/shards/14
Hello world! b --shard=15/15 --output-directory=.rush/operations/_phase_build/shards/15

aramissennyeydd avatar May 23 '24 22:05 aramissennyeydd