Document the new release process for Rustup
The Rustup release process has historically been a manual process that involved copying files from S3 to the local machine and back to S3. This introduced a high risk of human error. When modifications to the existing release script became necessary, the decision was made to automate the release process (see #3819 for details).
The documentation in the dev-guide has been updated to cover the new release process, which is fully automated to produce beta releases using GitHub Actions and the promote-release tooling.
Here's a rendered version of the file: https://github.com/jdno/rust-rustup/blob/new-release-process/doc/dev-guide/src/release-process.md
Thanks for working on this!
@jdno what's your plan for driving the new release process to completion? We're starting to think about how/when we want to publish our next release so it would be good to understand how much of a dependency we have on you and what that means for the schedule.
I've started working on it, but I can't give a good estimate on how long it'll take. My hope is to get most of the implementation work done in the next two weeks.
I don't want to be a blocker, though, so my work is strictly additive right now. The current process will continue to work until we have a full replacement. So regarding dependencies or the schedule, there shouldn't be any.
I'd strongly recommend adding roll back process for when a deployment causes failure like what happened in 1.28.0.
I'd like to echo and amplify the comments in https://github.com/rust-lang/rustup/issues/4211 about rolling back to make sure that there's more than one voice calling for this practice. The 1.28.0 release highlighted that Rustup is a load bearing piece of software. Having a failed deployment cause many downstream failures like this should have resulted in a rollback within about an hour, not a roll forward a day later.
I base this opinion having participated (as an owner and as someone affected by) in many post-mortems at a large internet company. The consensus engineering best practice is to rollback first, and think about how to solve the greater problem second. It's rare that this advice should be ignored, and there doesn't seem like there's a any good rationale that's been expressed here why that should have been so here. The users that migrated to using the new rustup changes would generally have been a very small number compared to those who were affected by this, and those users were actively aware of the changes. The impacted users were generally not made aware until this caused failure in their systems.
I want to explicitly state this comment is not intended to throw shade at all on anyone involved in 1.28. It is only meant to constructively improve the situation going forward.
I'd strongly recommend adding roll back process for when a deployment causes failure like what happened in 1.28.0.
@joshka I second this. In fact, I have already mentioned it on Zulip:
by the way, it’s not urgent, but can we have /archive/dist on our release server redirect to /archive or something? Anyway the goal is to trick rustup into thinking the archive is an actual release server, to provide a way for arbitrary rustup downgrade/upgrade/pinning without having to adapt the code on our side
I added that rustup could be modified to make use of the /archive path, however this modification itself would also need a new release to work.
Let's leave this thread for the release-related work: I've made https://github.com/rust-lang/rustup/issues/4240 for the rustup-related part of this problem.