Stop CI jobs on PyTorch PRs with humongous stacks
We have been impacted by this several times when devs unknowingly create a large stack of size 10+:
- The stack will hook up a big chunk, may be all of CI resources, preventing other people to run their jobs.
- In addition, a large stack like https://github.com/pytorch/pytorch/pull/116745 could also cause failures further down the line in diff train, i.e. D52549880
We need to take more active action here to prevent the issue from happening. Some thoughts:
- We could create a CI check running before other jobs stopping them from running in the first place when the stack size is larger than a threshold
- or updating ghstack to automatically add
[no ci]into the commit message for large stack https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs
In any cases, a clear message will need to be communicated with the owner of the stack about what happen (CI not running) and what they could do to fix it (squash commits, breaking up the stack)
cc @malfet @seemethere @izaitsevfb @kit1980 @clee2000 @PaliC
I think it's correct to not mess with ghstack as it's technically not in our control and this problem is specific to pytorch.
Another option is to have pytorchbot look for ghstack prs which are over 10 commits or so and cancel all workflows on those PRs.
One of the reasons why I submitted all those PRs in a stack was because I wanted signal on all of them at the same time. If we are going to limit resources, then we should do it for a reasonable stack size (e.g. over 10). Otherwise devs will be incentivized to submit 10 different PRs instead of in a stack, and then we have the same problem.
@zou3519 there should be a reasonable balance between CI overhead associated with rebasing such PRs and easy of development. I run some stats: both mean and median stack length was under 5. See https://github.com/pytorch/builder/blob/add4488dcc3504b9e58bd470a1041501a294e7d2/analytics/github_analyze.py#L376C43-L377C1 if you want to run the analysis yourself
We probably don't want mean/median, because that means 50% of people will need to worry about not getting signals on their diff stacks? How does the p90 or p80 look?