node doc: improve the process for landing a PR

trafficstars

Even though incident could still happen like https://github.com/nodejs/node/pull/54554 (which was partially my fault and I learned the lesson), I wanted to share my thoughts and hope to reduce the potential breakage in the future. As I noticed the other PR in net landed and I should've taken a closer look, or even just re-trigger the CI when in doubt, both approaches would help me to catch the test failure earlier.

Wording is not my expertise, happy for any suggestions 👍 or even just any thoughts on this potentially ongoing issue.

Aug 26 '24 02:08 jakecastelli

Review requested:

[ ] @nodejs/tsc

Aug 26 '24 02:08 nodejs-github-bot

I agree there is probably no need to take action since its very rare

Aug 26 '24 07:08 marco-ippolito

Thanks for reviewing, I agree with you. Since this is rarely going to happen and the recovery is not costly (either revert or fast track a fix), I think it is probably not worthing the extra burden to do the check.

Aug 26 '24 09:08 jakecastelli

Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).

Aug 26 '24 10:08 targos

Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).

That's not a bad suggestion! We could exclude subsystems e.g. doc, meta etc. to make things slightly faster.

Aug 26 '24 10:08 jakecastelli

Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).

Or run tests as part of our own CQ system. I don't think this can reasonably work on GHA runners, we'd need to use a runner that can cache build results so most build take instants instead of about one hour currently, plus another hour for running the tests.

Aug 26 '24 10:08 aduh95

Instead, I think we should look into using GitHub's commit queue system (which I think can ensure that tests pass after applying the commits in order).

That was the original design idea for the commit queue, many years ago. The main problem is how slow and flaky our CI has been.

Aug 26 '24 11:08 tniessen

The main problem is how slow and flaky our CI has been

A full CI run would be rough, but a quick test run (on windows, macOS, Linux) hopefully would be sufficient.

Aug 26 '24 11:08 jakecastelli

@jakecastelli do you still want to pursue this? There seems to be quite a few concerns with this PR:

it sets unrealistic standards for landing a PR.
it won't be very effective (the CQ label is sometimes added hours or even days before a PR actually lands).
there might be better solutions (e.g. have the CQ run the test suites when PR is labelled with needs-ci).
the current situation is "good enough" IMO (at least I know that I will 100% of the time take the risk of not doing that optional step you're adding, and I suspect most folks would do the same, so what's the point of having it in our docs I wonder).

Note that this PR is technically ready to land if you decide to ignore those concerns (there's at least one approving review, and no one is blocking it), so it's up to you.

Sep 02 '24 08:09 aduh95

Thanks for summarising the concerns and feedback @aduh95, I have decided not to proceed with landing, I left it open for a bit just to see if there were other folks thoughts or feedback. I think I am in a position to close it as:

I agree this would add extra burden to land a PR with very little benefit.
We can always revert a faulty commit, maybe it would block some other PRs from landing for day or two, it won't be catastrophic.
There are potentially better solutions with automated toolings.

Appreciated all the reviews and feedback 🙏

Sep 02 '24 09:09 jakecastelli

node node copied to clipboard

doc: improve the process for landing a PR

node
node copied to clipboard