dotnet-docker icon indicating copy to clipboard operation
dotnet-docker copied to clipboard

Re-think repo branching strategy

Open lbussell opened this issue 1 year ago • 9 comments

Describe the Problem

Currently, we have two branches - main and nightly, which map to mcr.microsoft.com/dotnet and mcr.microsoft.com/dotnet/nightly respectively today.

  • main contains the officially released and supported versions of .NET including All LTS, STS, Preview and RC releases.
  • nightly contains all of the same images, with the addition of the following:
    • nightly builds of Preview and RC releases
    • .NET Docker infrastructure changes that are intended to release with the next servicing release
    • unsupported experimental images that we want to get feedback on (Ubuntu Chiseled, AOT deps & SDK)

The issue is that each month for servicing, we merge changes from nightly into main.

Before a new Preview release, there are usually two versions of .NET in development at the same time. For example, .NET 8.0 RC1 and RC2. We can only choose one to be in nightly at a time. RC1 needs to be stable and tested before we push it to main (stable). This is ultimately more important for customers, so currently we leave stable public versions of previews/RCs in nightly ahead of a Preview release. However, as soon as branding updates happen for RC2, .NET Devs will still want to test new features against the latest nightly builds. We have no way to provide constantly updated, truly "nightly" builds for .NET Devs.

Describe the Solution

@jander-msft suggested a 3 branch strategy:

  • main stays the same as it is
  • staging contains stable public versions of the "next" release, essentially what will be merged into main on servicing day. This could even share the same commit history as main to make merging on Patch Tuesdays easier. This includes infrastructure changes.
  • nightly - there are a few different ideas about how this branch could behave, just throwing some things out there:
    • Stay up to date the latest public builds of dotnet/installer (closest to what we have today)
    • Forget about coherency and layering and use the very latest builds from the individual dotnet/sdk, dotnet/aspnetcore, dotnet/runtime repos, just for .NET devs
    • Include only pre-release versions of .NET, no stable versions

Using this pattern, Infrastructure changes intended for release would probably need dual check-ins to nightly and staging, essentially front-loading the work of merging nightly to main for servicing releases.

This still leaves one hole in our offering - what do we do about experimental images we want to get feedback on? For example, it would be valuable for customers to be able to test AOT images against the latest stable version of .NET instead of being locked into using the next preview. It could fit into staging, but as soon as Previews are released, customers wouldn't be able to test AOT against stable .NET 9 Preview versions.

Additional Context

  • Somewhat related: https://github.com/dotnet/docker-tools/issues/649

lbussell avatar Sep 20 '23 17:09 lbussell

[Triage] An important scenario to consider here is the dev workflow. For feature development, where does a PR get checked into? How does it flow to other branches?

It's also useful to ensure that servicing versions are rebuilt regularly to catch any breaks that may occur from external sources (base image, new packages).

mthalman avatar Sep 20 '23 19:09 mthalman

Here's one proposal for the dev workflow. Important bits are bolded.

main

  • Contains all currently supported .NET major versions. (the same as it is today)
  • Can contain experimental .NET Docker images, which would live in a repo like dotnet/experimental/runtime-deps, dotnet/expiremental/runtime, etc. This way experimental images have the latest stable .NET versions for customers to test.

staging

  • Devs check in new code and features to staging.
  • Contains the "next" version of .NET that will be released with the next servicing release.
  • Updated with automatic dependency flow the same way nightly is today, meaning when there are two in-dev versions of .NET, staging has the more stable one.
  • Experimental images would live under dotnet/staging/expiremental/<repo> or just dotnet/staging/<repo>?

nightly

  • Automatically updated with bleeding-edge .NET Versions.
    • What this means will probably change with .NET 9 because of the VMR. However, what this would mean for .NET 8 is that this branch should be subscribed to the release/8.0.1xx channel of dotnet/installer
  • Infrastructure is automatically updated with nightly dependency flows to be in sync with staging, similar to how we update internal/release/nightly today.
flowchart TD
    s[devs check in code] --> staging
    staging -->|automatic<br>dependency flows| nightly
    staging -->|monthly<br>servicing updates| main
    nightly

This workflow would probably require about the same amount of merging/backporting/updating work as we have today, but there are some questions that still need to be answered:

  • What happens when a nightly dependency flow causes a break which needs a change in infrastructure? How do we handle nightly and staging being out of sync?
  • What if we want special images in nightly only, and not staging?

lbussell avatar Oct 09 '23 19:10 lbussell

Could nightly be subscribed to multiple channels? Leading up to .NET 8, we've got three active installer branches: release/8.0.1xx.rc2, release/8.0.1xx, and main. Obviously, we can have only one 8.0 represented in our Dockerfiles, which would be from release/8.0.1xx in this case. But could we also support .NET 9 Dockerfiles from main?

The flowchart seems misleading with the labels on the arrows. There should be another node representing installer that has separate flows into both staging and nightly.

What happens when a nightly dependency flow causes a break which needs a change in infrastructure? How do we handle nightly and staging being out of sync?

We may want to consider having branches for docker-tools. This has been something that would have been good in the past for certain situations. It would allow for multiple versions of Image Builder. I think it would be fine if staging and nightly had to have different eng/common content for a while. If the infrastructure differences went beyond that, then we'd have to see. We should always defer to keeping staging working. If that means nightly has to be busted for a while, so be it.

What if we want special images in nightly only, and not staging?

The proposal is centered around which .NET version is represented in the branch. That's just one pivot. The other is which Dockerfiles are represented. Do you have a scenario where we would want something in nightly but not staging? A new Dockerfile being developed, like with AOT, would start in staging, then get flowed to nightly. Once its ready, it would flow to main to be either fully supported or experimental.

mthalman avatar Oct 09 '23 20:10 mthalman

Could nightly be subscribed to multiple channels? Leading up to .NET 8, we've got three active installer branches: release/8.0.1xx.rc2, release/8.0.1xx, and main. Obviously, we can have only one 8.0 represented in our Dockerfiles, which would be from release/8.0.1xx in this case. But could we also support .NET 9 Dockerfiles from main?

This would be great. I don't see why we couldn't have 8.0 and 9.0 Dockerfiles living in nightly, even right now. It would take some minor edits to the update-dependencies pipeline to support this.

As an aside, this all makes me feel like we should be able to produce some bleeding-edge .NET images as build artifacts from the installer or SDK repos. They could live on some ACR or GHCR and maybe that would satisfy the demand from internal teams wanting to test the very latest changes.

The flowchart seems misleading with the labels on the arrows. There should be another node representing installer that has separate flows into both staging and nightly.

I was intending to show the dev workflow/code flow only, not dependency updates. What I meant to show is that changes from staging flow to nightly with nightly dependency updates, and changes from staging flow to main with monthly servicing updates. This is what you were looking for, probably?

flowchart TD
    %% s[devs check in code] --> staging
    %% staging -->|automatic dependency flows| nightly
    %% staging -->|monthly servicing updates| main
    nightly
    installer -->|release/8.0.1xx| nightly
    installer -->|main| nightly
    installer -->|release/8.0.1xx-rc.2| staging
    staging -->|merge on<br>patch tuesday| main

lbussell avatar Oct 09 '23 21:10 lbussell

I was intending to show the dev workflow/code flow only, not dependency updates.

Ok, I misinterpreted what automatic dependency flows meant. I assumed it meant the flow from installer but you're meaning it's just porting of commits between branches.

mthalman avatar Oct 10 '23 13:10 mthalman

Had a short discussion with @jander-msft where we brainstormed two new ideas:

  1. Maintain identical templating infrastructure between main and nightly. Define a manifest.features.json that determines how Dockerfile and Readme generation behaves for a given branch. For example, including or excluding aot images, or enabling non-root user support in 8.0 only. Would require https://github.com/dotnet/docker-tools/issues/677.
  2. A logical extension of that same idea. main and nightly only contain Dockerfiles, readmes, and the manifest. Maintain the templating infrastructure in a separate branch or repo. Then use the output from that repo to update main and nightly automatically when necessary. This would prevent any merges for main and nightly completely, and adding a new repo (staging?) would be as simple as adding a new manifest.features.json to the templating repo. Or something.

lbussell avatar Nov 10 '23 22:11 lbussell

  1. Maintain identical templating infrastructure between main and nightly. Define a manifest.features.json that determines how Dockerfile and Readme generation behaves for a given branch. For example, including or excluding aot images, or enabling non-root user support in 8.0 only. Would require GenerateDockerfilesCommand fails when there isn't a previously generated Dockerfile docker-tools#677.
  2. A logical extension of that same idea. main and nightly only contain Dockerfiles, readmes, and the manifest. Maintain the templating infrastructure in a separate branch or repo. Then use the output from that repo to update main and nightly automatically when necessary. This would prevent any merges for main and nightly completely, and adding a new repo (staging?) would be as simple as adding a new manifest.features.json to the templating repo. Or something.

[Triage] capturing a discussion that we had at Triage about this proposal:

The overall goal of this proposal is to reduce the time and effort in the servicing workflow for the dotnet-docker repo. Currently we cherry-pick changes back and forth between the main and nightly branches. This is time-consuming and error prone. By keeping the infrastructure identical between the main and nightly branches, we could reduce the time and effort required to release .NET Docker images as well as have less potential for human error.

#2 is a step too far and has a couple of downsides:

  1. Branches are not self-contained. A developer wouldn't be able to make changes to templates and see the results immediately, and we would need extra tooling to work around that.
  2. Code reviews become more difficult. It is important to be able to review the templating changes and the generated output (Dockerfiles) side-by-side. Oftentimes code reviewers only look at the generated code.

As for #1, it seems like a step in the right direction. Here is some more clarity on my vision for the developer workflow:

  1. Along with improvements from https://github.com/dotnet/docker-tools/issues/677, the GenerateDockerfilesCommand should be able to completely reset the state of the generated output (the /src folder) and create all the files it needs at code generation time.
  2. The manifest.features.json should be identical between branches so that there aren't merge conflicts. The manifest should include feature configurations explicitly for each shipping branch (it could potentially configure features per .NET version as well), and ImageBuilder should be able to target any of the configurations when generating Dockerfiles and Readmes in order to change the generated output.
  3. The release process would ideally look like this:
    1. Checkout main branch for release.
    2. git merge nightly (as opposed to cherry-picking commits from nightly one-by-one)
    3. Regenerate Dockerfiles and Readmes (they would have conflicts in the merge). This completely resets the state of the /src/ folder, deleting all non-shipping Dockerfiles and adding any that don't exist.
    4. Ideally, at this point, there would be no merge conflicts. I can't say for certain that there wouldn't be, however. There's potential that we would need to have an infrastructure file or two different between the branches in order to have CI and local dev workflows correctly between the two branches.

lbussell avatar Nov 17 '23 19:11 lbussell

[Triage] We all agree that this is a problem that should be addressed in order to improve the servicing workflow, so I've removed the untriaged label. Since there isn't a clear direction for work yet, this issue can continue to serve as the design discussion for this repo's branching strategy.

lbussell avatar Aug 12 '24 18:08 lbussell

I simplified the issue title for clarity.

richlander avatar Aug 12 '24 18:08 richlander