sig-release icon indicating copy to clipboard operation
sig-release copied to clipboard

Soliciting feedback: Release cadence for out-of-tree features

Open d-nishi opened this issue 5 years ago • 20 comments

SIGs such as the cloud providers operate like user-groups. The SIGs are responsible for tight integrations of provider APIs with Kubernetes APIs/Interfaces such as out-of-tree cloud provider interfaces, CSI, CNI, CRI specifications etc. These integrations satisfy a Kubernetes user's needs when running Kubernetes on a cloud provider (not necessarily motivated to enhance a specific implementation of Kubernetes such as EKS or AKS or GKE or VKE). These integrations contribute to the richness of the Kubernetes ecosystem and aim to drive consistent behavior through the interface implementation across providers. It is therefore in the best interest of the Kubernetes user that such implementations/integrations be pegged to a Kubernetes major version release and the testing, documentation discipline enforced on the in-tree Kubernetes features be adopted/recommended for the out-of-tree feature releases as well.

This leads to a couple of open questions that need to be resolved/discussed since no reference able guidelines exists today. Referring specifically to this issue in v1.13

  1. Tracking out-of-tree Kubernetes features - should this be out of scope for the SIG-release team and in scope for the responsible SIG?

  2. Cadence of out-of-tree feature releases - should SIGs continue to adhere to release best practices that Kubernetes in-tree features follow aka follow the requirements necessary with testing and documentation to move the feature from alpha to beta to GA?

  3. Reporting out-of-tree feature updates - should a summary for out-of-tree features continue to be added to the Major Theme section of the release notes as described here

  4. Publishing test results in testgrid - should the integration test results be post-submit, non-blocking and be visible in testgrid for an out-of-tree feature to qualify as beta/GA?

  5. Documentation - should the documentation be regularly updated for a feature to move through the release cadence?

  6. KEPs - should KEPs be necessarily updated for a feature to move through the release cadence?

  7. Release cadence ownership - should the SIG Chairs be responsible for the quality of the releases and the systematic follow through on the release cadence?

d-nishi avatar Feb 01 '19 00:02 d-nishi

/cc @saad-ali @justaugustus @thockin @justinsb @spiffxp @tpepper @dims @andrewsykim

Created the issue as discussed!

d-nishi avatar Feb 01 '19 00:02 d-nishi

/assign @justaugustus

d-nishi avatar Feb 01 '19 00:02 d-nishi

Pinging @kubernetes/sig-release for input as well...

justaugustus avatar Feb 01 '19 07:02 justaugustus

Just speaking out loud here. Maybe features that touch the core tree should follow the same enhancement processes until they are fully removed? Anything fully "out-of-tree" can have its own cadence? Could be an added incentive to move more things out-of-tree.

andrewsykim avatar Feb 04 '19 16:02 andrewsykim

I've written some thoughts below-please keep in mind that they are my own opinions only and that they might come with a somewhat incomplete context:)

tl;dr: My instinct would be to move as much of the ownership as possible to the owning SIGs/working groups, while making sure that the Kubernetes thing (previously built off core, now core+out-of-tree) that end users deploy and run is of high quality.

1. Tracking out-of-tree Kubernetes features - should this be out of scope for the SIG-release team and in scope for the responsible SIG?

I could see SIGs being more effective at owning out-of-tree features than SIG-release. They know the complexity and work-to-be-done best, and so can speak more effectively to milestones and maturity. At the same time, it probably makes sense to have some sort of standardization/shared understanding, among SIGs, of what tracking looks like. This could be through process, through a central coordinating team (sig-release/sig-pm?) or something else...

2. Cadence of out-of-tree feature releases - should SIGs continue to adhere to release best practices that Kubernetes in-tree features follow aka follow the requirements necessary with testing and documentation to move the feature from alpha to beta to GA?

If I read this correctly, there are 3 things bundled into this prompt.

a. Should out-of-tree features follow the same requirements as in-tree ones, in terms of testing: This one is a strong "yes" for me, if not even higher standards. My reasoning is that for end users, whether features are developed in- or out-of- tree is an implementation detail. The quality of the software they run should be the same, regardless of how the code and development process is structured.

b. Should out-of-tree features follow the same requirements as in-tree ones, in terms of documentation:

  • Also a strong "yes" for me, for similar reasoning as the one on testing. I see documentation as a user-facing feature of the product, and so I think it's important to keep the same amount of detail/style etc. regardless of where in the tree the documented backing features were developed.
  • Another important point is that of discoverability and consistency of docs (e.g. how docs for out-of-tree features are connected to docs of core Kubernetes). There is a trap of duplication/repetition/gaps, particularly if they're maintained separately.

c. Cadence of out-of-tree feature releases (i.e. what should the cadence of out-of-tree releases be relative to in-tree)

  • I'd personally advocate for "at least as often as in-tree releases, optionally more frequently if the SIG thinks it makes sense".
  • Additional complexity to consider, compared to features developed exclusively in-tree:
    • version compatibility between out-of- and corresponding in-tree releases, and
    • long-term-support of the out-of-tree releases themselves

3. Reporting out-of-tree feature updates - should a summary for out-of-tree features continue to be added to the Major Theme section of the release notes

Caveat: I'm not a docs expert[0]! Extra grain of salt here! My view is that:

  • No, especially if these components are released outside the release of the in-tree components
  • however I could see a Major Themes section in the out-of-tree components' Release Notes
  • compatibility with "core" Kubernetes versions, and compatibility with previous versions of out-of-tree component should be explicit here.

4. Publishing test results in testgrid - should the integration test results be post-submit, non-blocking and be visible in testgrid for an out-of-tree feature to qualify as beta/GA?

  • I'd personally advocate for integration tests to run as much as possible pre-submit, and for end-to-end to run post-submit. My reasoning here is: the closer a test is to development time => the faster the feedback loop => the less time & back-and-forth overall it takes for humans to fix all the things. However that's a bit of an overgeneralization so it's probably worth looking at some real life examples!
  • non-blocking: I understand this to mean non-Kubernetes-release-blocking (as opposed to non-out-of-tree-feature-blocking, is that right? If yes, non-blocking and visible in testgrid makes sense. In my head, an out-of-tree developed component is sort of a consumer of in-tree features and APIs, so at the very least test results for out-of-tree components are kubernetes-release-informing, and occasionally even blocking (if they unearth unexpected incompatibilities, for example).

5. Documentation - should the documentation be regularly updated for a feature to move through the release cadence? 6. KEPs - should KEPs be necessarily updated for a feature to move through the release cadence?

For me, strong "yes" to both of these points. The fact that features are developed out-of-tree does not by itself make them any less important than in-tree developed features. So it makes sense to capture the reasoning/intention (KEP) and the end functionality (user-facing documentation) with a similar amount of detail.

7. Release cadence ownership - should the SIG Chairs be responsible for the quality of the releases and the systematic follow through on the release cadence?

Yes; at the same time

  • We should probably get explicit about what responsible and quality cover in this context
  • I think there's still value in an independent (=non-implementing-SIG-owned) stage of "do all parts if a deployed Kubernetes work well together".
    • Not sure what that would best look like in practice though; a lightweight collection of tests? Soak/skew tests to simulate production environments? Something else?

[0] Or an expert of any kind now that I think of it, so just throw salt all over:)

cc @hoegaarden

mariantalla avatar Feb 06 '19 18:02 mariantalla

@mariantalla - Thanks for the well thought out response.

d-nishi avatar Feb 06 '19 18:02 d-nishi

@tpepper - can you please share your thoughts on this issue? Its about release cadence for out-of-tree features which is relevant to all cloudprovider features per se'

d-nishi avatar Feb 12 '19 19:02 d-nishi

FYI in this week's SIG Release meeting we had a lot of discussion on this topic.

Meeting notes and recording are online.


Some personal thoughts and observations:

  • Tracking & KEPs: general initial consensus seems to be forming around the idea that core release team does not need to track out-of-tree work. But a common process such as KEP is useful across the project including out-of-tree work. If there is some out-of-tree work and interdependency that needed noted in the core k8s release notes, that would be best coordinated by a KEP and in the normal Kubernetes release team's enhancements process. We're thinking such out-of-tree KEPs tied to core process would probably be a relatively rare thing for some highlight or tight coupling case. Discussions so far are NOT proposing all the things out-of-tree must do all their things in KEPs or core release tracking process.
  • Reporting: If something out-of-tree is independent, they should handle their own reporting out to their user base. The core reporting tied to the release will focus on the core. Some component reporting though would happen in terms of listing key tested dependency name/version/release, so there is fuzziness and see prior bullet on tracking/KEPs.
  • Cadence: a central idea in splitting the monolith is to enable independent cadences. I can imagine types of out of tree components which would run at faster and slower cadences. We'll see how this flows. The more independent components are, the better it will work.
  • Quality:
    • Cadence: I think quality should be treated distinctly from cadence, while recognizing they get tied up together in practice. In core we have a concept of alpha/beta/stable graduation. This is tied to core cadence in a way that today implies at least a 9 month maturation period. Whether it's 3 releases or 9 months, the number is fairly arbitrary. If an out-of-tree component has weekly releases and goes alpha/beta/stable in three weeks and the deliverable lacks quality in the eyes of the user the component's adoption, including by core k8s, will suffer.
    • Testgrid: The evolution of SIG Cloud Provider, with the current providers moving to be sub-projects under it, has been accompanied by an increase in providers feeding quality data into testgrid. This is good, but is early/minimal and needs increased. Post-submit, periodic, and sufficiently high-quality to be release informing for the core is a good goal, though it requires dramatically more engagement from most all the vendors.
    • Documentation/Tracking/KEPs: A key deliverable of an out-of-tree component owner is to declare and fulfill quality criteria in their releases. If that is similar to how core k8s operates, it will feel familiar and consistent to users of both the component and an integrated k8s. Historically, low quality components loose both in the business marketplace and in the open source developer mindshare space.
  • Documentation: SIG Docs already works to determine what should be in the core documentation and website and their criteria should be inspiration for SIG Release. How vendors document their integrations will be up to the vendors and will likely have a large amount of variation.
  • Pragmatism: There's a lot of cross-component coupling today and many immature APIs. As the APIs evolve, independent operations in planning/tracking/development/validation/release will become easier and shared criteria more established. It's unrealistic to expect things are magically smooth and independent today with objective criteria covering all integration scenarios. We'll all need to be pragmatic and also aggressively pull in stakeholders early.

tpepper avatar Feb 15 '19 20:02 tpepper

/sig pm release

justaugustus avatar Feb 17 '19 08:02 justaugustus

/milestone v1.15 /priority critical-urgent

justaugustus avatar May 01 '19 11:05 justaugustus

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 30 '19 11:07 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot avatar Sep 20 '19 09:09 fejta-bot

/remove-lifecycle rotten /priority important-soon

tpepper avatar Sep 20 '19 15:09 tpepper

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Dec 19 '19 16:12 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot avatar Jan 18 '20 17:01 fejta-bot

/remove-lifecycle rotten /lifecycle frozen /priority important-longterm /remove-priority critical-urgent

justaugustus avatar Jan 21 '20 00:01 justaugustus

/sig architecture @neolit123 is picking this topic up in the context of kubeadm

  • https://groups.google.com/forum/#!topic/kubernetes-sig-testing/jrqTS1BDo0g
  • https://docs.google.com/presentation/d/1Utf8NgLZTrS8FmVoU6dZN3siewjGOwvTmFcsJ6vgQLI/edit#slide=id.p

spiffxp avatar Apr 10 '20 05:04 spiffxp

As an incremental step forward, I've added a tracked/out-of-tree label to k/enhancements, which can be used by the @kubernetes/release-team when an enhancement is out-of-tree and doesn't need to be tracked by the team.

ref: https://github.com/kubernetes/test-infra/pull/17473

justaugustus avatar Apr 30 '20 23:04 justaugustus

Heya, lots of great points and perspectives made here. With that in mind, is it possible to identify what the general goals and next action items might be?

LappleApple avatar May 15 '20 17:05 LappleApple

Worth noting that we have a current proposal for this topic w.r.t external cloud providers https://github.com/kubernetes/enhancements/pull/1727

andrewsykim avatar May 15 '20 17:05 andrewsykim