enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

KEP-3659: ApplySet: kubectl apply --prune redesign and graduation strategy

Open KnVerey opened this issue 3 years ago • 3 comments

  • One-line PR description: Initial PR for https://github.com/kubernetes/enhancements/issues/3659
  • Issue link: https://github.com/kubernetes/enhancements/issues/3659

KnVerey avatar Nov 15 '22 23:11 KnVerey

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot avatar Nov 15 '22 23:11 k8s-ci-robot

/cc

eddiezane avatar Jan 11 '23 17:01 eddiezane

I believe this proposal addresses the most important problems with the current prune, and I don't think the two reservations I've enumerated are blockers.

/lgtm

seans3 avatar Feb 08 '23 00:02 seans3

I do think we should consider making this NOT a client-side only feature, though. I think a lot of the complexity comes in because we're trying to avoid creating a new server side resource.

@johnbelamaric I haven't told that Katrina and Justin, yet, but I'm slowly preparing them to gain confidence to create that new server-side resource, but shhhh, don't tell them :wink:

soltysh avatar Feb 08 '23 10:02 soltysh

/label tide/merge-method-squash

soltysh avatar Feb 08 '23 20:02 soltysh

/approve

From a PRR perspective. From my "project" perspective, I question the resources spent on a a purely client side approach. But that's up to the SIG and those working on the feature. I believe that this approach won't necessarily get more feedback, and if ecosystem tools have to adapt to this approach and then to a later server side approach, it's a lot of churn and wasted effort. My 2 cents - do with it what you will :)

johnbelamaric avatar Feb 08 '23 21:02 johnbelamaric

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, KnVerey, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Feb 09 '23 09:02 k8s-ci-robot

So I encountered a slight challenge while using this. If we have a CRD and an instance of that CRD (a CR) in the same manifest, we know the GVKs for all the objects, but we can't (easily) know the GVR for the CR until we apply the CRD. This means we can't update the applyset/contains-group-resources annotation before applying, which we need to do so that we know this annotation always "covers" all the resources in our applyset (i.e. so that we don't leak objects if we crash mid-apply).

Arguably this is a regression vs the existing kubectl behaviour. kubectl apply doesn't handle this particularly well, but what I believe would happen is that apply will apply the objects it can when first invoked, this would include the CRD. The user would then get an error and try it again, and apply would succeed that time (assuming the CRD was fully registered "in time"). Sometimes we would get lucky and if the CRD comes first in the manifest it would be fully registered by the time we got to the CR.

The regression is that now we can't even start the apply if we're using prune.

A few options:

  1. We could change to GVKs. This is actually what the KEP says as-merged, we changed to group-version-resources based on a suggestion, but I think GVK is simpler and more user-friendly. (Migration would be a concern, though we are in alpha/feature-flagged)

  2. We could just apply the objects that we can map to GVRs, and return an error. This is arguably much more complex but similar behaviour to kubectl today.

  3. We could introduce some form of "smart reordering" into kubectl when used with prune, fixing the behaviour here. We would try to apply objects in order, but if we were unable to apply an object because we couldn't map it to a GVR we would put it into a "backlog" and try those objects again after the objects we were able to apply. This is much more user-friendly behaviour. We haven't been willing to change the behaviour of apply for compatibility reasons previously, but we could choose to argue here that prunev2 can include new semantics, and this feels like a very benign and user-friendly change. For efficiency, we might still want to change to GVKs, so that we don't have to update the parent object on each "apply wave".

In terms of compatibility if we did switch to GVKs, there are a few choices:

a) Create a new annotation and simply break everyone using the feature today - it is alpha & feature-flagged. We could implement the fallback to full discovery. b) Implement our fsck command. The fsck command could be pretty simple, in that it could start by simply rewriting this annotation, but we do want an fsck command that can e.g. check for missing annotations generally. We could tell people to run fsck if we see the "old" annotation. c) Seamlessly migrate users using the "old" annotation to the "new" annotation as part of kubectl apply. The challenge here is that it's not clear how we would ever drop this migration logic. Maybe we could come up with some criteria like "we will drop when we go GA", but there's always the risk that we break someone.

Personally I think option b is the best, it sets us up to put more functionality into fsck, and it feels more sustainable for an alpha / feature-flagged feature.

For reference, the current annotation is applyset.kubernetes.io/contains-group-resources, I would suggest we would switch to applyset.kubernetes.io/contains-group-kinds

https://github.com/kubernetes/kubernetes/blob/ad18954259eae3db51bac2274ed4ca7304b923c4/staging/src/k8s.io/kubectl/pkg/cmd/apply/applyset.go#L63

justinsb avatar Apr 07 '23 13:04 justinsb