node
node copied to clipboard
doc: improve the process for landing a PR
Even though incident could still happen like https://github.com/nodejs/node/pull/54554 (which was partially my fault and I learned the lesson), I wanted to share my thoughts and hope to reduce the potential breakage in the future. As I noticed the other PR in net landed and I should've taken a closer look, or even just re-trigger the CI when in doubt, both approaches would help me to catch the test failure earlier.
Wording is not my expertise, happy for any suggestions 👍 or even just any thoughts on this potentially ongoing issue.
Review requested:
- [ ] @nodejs/tsc
I agree there is probably no need to take action since its very rare
Thanks for reviewing, I agree with you. Since this is rarely going to happen and the recovery is not costly (either revert or fast track a fix), I think it is probably not worthing the extra burden to do the check.
Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).
Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).
That's not a bad suggestion! We could exclude subsystems e.g. doc, meta etc. to make things slightly faster.
Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).
Or run tests as part of our own CQ system. I don't think this can reasonably work on GHA runners, we'd need to use a runner that can cache build results so most build take instants instead of about one hour currently, plus another hour for running the tests.
Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).
That was the original design idea for the commit queue, many years ago. The main problem is how slow and flaky our CI has been.
The main problem is how slow and flaky our CI has been
A full CI run would be rough, but a quick test run (on windows, macOS, Linux) hopefully would be sufficient.
@jakecastelli do you still want to pursue this? There seems to be quite a few concerns with this PR:
- it sets unrealistic standards for landing a PR.
- it won't be very effective (the CQ label is sometimes added hours or even days before a PR actually lands).
- there might be better solutions (e.g. have the CQ run the test suites when PR is labelled with
needs-ci). - the current situation is "good enough" IMO (at least I know that I will 100% of the time take the risk of not doing that optional step you're adding, and I suspect most folks would do the same, so what's the point of having it in our docs I wonder).
Note that this PR is technically ready to land if you decide to ignore those concerns (there's at least one approving review, and no one is blocking it), so it's up to you.
Thanks for summarising the concerns and feedback @aduh95, I have decided not to proceed with landing, I left it open for a bit just to see if there were other folks thoughts or feedback. I think I am in a position to close it as:
- I agree this would add extra burden to land a PR with very little benefit.
- We can always revert a faulty commit, maybe it would block some other PRs from landing for day or two, it won't be catastrophic.
- There are potentially better solutions with automated toolings.
Appreciated all the reviews and feedback 🙏