enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

KEP-2839: Add KEP for in-use protection

Open mkimuram opened this issue 3 years ago • 93 comments

  • One-line PR description: A generic feature to protect objects from deletion while it is in use
  • Issue link: https://github.com/kubernetes/enhancements/issues/2839
  • Other comments:

mkimuram avatar Jul 27 '21 14:07 mkimuram

Cross-linking an old issue: https://github.com/kubernetes/kubernetes/issues/10179

bgrant0607 avatar Aug 03 '21 17:08 bgrant0607

@lavalamp @bgrant0607

Sharing just a rough idea.

Just for preventing deletion, the below design will be enough (We still need to consider how to protect deletion of LienClaim itself). Also, update may be implemented in the same way, like adding "spec-update" to LienClaim.spec.prevents and appending uuid to *.metadata.spec-update-liens. However, it would be difficult to define preventing update for specific fields as we can specify through "field manager", by this way.

  1. User creates LienClaim in their namespace with specifying targets and prevents.
apiVersion: v1beta1
kind: LienClaim
metadata:
  namespace: ns1
  name: my-lien
spec:
  targets:
  - apiVersion: v1
    kind: secret
    namespace: ns2
    name: important-secret
  prevents:
  - delete
  1. Lien controller watches LienClaim and create cluster-scoped Lien and append its name to *-liens, or delete-liens in the below example, to the target if a new LienClaim is created. Note that a name for Lien is generated as UUID.
apiVersion: v1beta1
kind: Lien
metadata:
  name: 12345678-1234-1234-1234-1234567890ab
spec:
  targets:
  - apiVersion: v1
    kind: secret
    namespace: ns2
    name: important-secret
  source:
    namespace: ns1
    name: my-lien
  prevents:
  - delete
apiVersion: v1
kind: Secret
metadata:
  namespace: ns2
  name: important-secret
  delete-liens:
  - 12345678-1234-1234-1234-1234567890ab
type: Opaque
data:
  1. Lien controller may also update LienClaim status if 2 succeeds
apiVersion: v1beta1
kind: LienClaim
metadata:
  namespace: ns1
  name: my-lien
spec:
  targets:
  - apiVersion: v1
    kind: secret
    namespace: ns2
    name: important-secret
  prevents:
  - delete
status:
  lien: 12345678-1234-1234-1234-1234567890ab
  phase: preventing
  1. On deletion of resources, lien admission controller checks if metadata.delete-liens is empty. If not, return error to block deletion.

  2. Lien controller watches LienClaim and delete the lien from *-liens on the target and delete Lien if the LienClaim is deleted.

mkimuram avatar Aug 04 '21 17:08 mkimuram

From the viewpoint of consuming lien from secret protection, which is my biggest concern. How to create LienClaim before creating referencing objects and delete LienClaim after deleting referencing objects is need to be considered. (I'm not sure if mutation hook or ownerReference can achieve it. Although, we can at least leave it for users.)

mkimuram avatar Aug 04 '21 17:08 mkimuram

Why do you want to have lien objects? Why are text strings not sufficient (as used for finalizers). Adding additional object types adds more places to have races, I'm against it unless you can convince me it's absolutely necessary.

lavalamp avatar Aug 04 '21 17:08 lavalamp

Why do you want to have lien objects?

It's to make it easier to manage who has control over the specific lien in the delete-liens field.

If multiple liens are set to a single resource, we will need to ensure that the specific lien is managed by the client/controller when deleting the lien. We may add the LienClaim's identity information to delete-liens but it will tend to be long and difficult to much.

Instead, we might just add LienClaim's uid. However, just adding uid, it is difficult for the owner of the resource to find from the uid. Also, I'm not sure if object uid can be regarded as unique. Creating an object with the name should ensure it.

Adding additional object types adds more places to have races, I'm against it unless you can convince me it's absolutely necessary.

Agree. If above concern can be solved by the other way, I agree to remove cluster-scoped Lien object.

mkimuram avatar Aug 04 '21 18:08 mkimuram

You want to ACL liens? I don't see a need to enforce this any more than we do for finalizers. And I would not solve that problem by adding Lien or LienClaim objects. I don't think a solution here should require any additional object types.

lavalamp avatar Aug 04 '21 18:08 lavalamp

If only talking about the use case for secret protection, the feature needed is like block deleting "Secret A" that can be used by "Pod B" and "PersistentVolume C" while the secret is used. So, something like below will be enough.

apiVersion: v1
kind: Secret
metadata:
  namespace: ns2
  name: A
  delete-liens: 
    - "ID to show that Pod B is using"
    - "ID to show that PersistentVolume C is using"
type: Opaque
data:

And to generalize it, I was thinking about a way to manage such ID for each reason to avoid conflict in deleting liens by lien system's side. However, we can leave it to consumers.

mkimuram avatar Aug 04 '21 18:08 mkimuram

And also thinking about using a lien per controller, not per reason, like below.

apiVersion: v1
kind: Secret
metadata:
  namespace: ns2
  name: A
  delete-liens: 
    - "k8s.io/secret-protection-controller"
type: Opaque
data:

Then, I start to think about which will be easier "implementing own admission hook to block deletion" or "implementing controller to add lien"?

(Obviously, for users who would like to add such protection manually, lien systems is useful as this shows, but for controllers, it might not.)

mkimuram avatar Aug 04 '21 19:08 mkimuram

There is no need for a lien controller. We would add code to kube-apiserver (reject deletion if there are liens, don't permit new liens if deletion timestamp is set).

We might want slightly more structure than raw text.

You can prototype today with e.g. specially formatted annotations and a webhook admission controller.

lavalamp avatar Aug 04 '21 19:08 lavalamp

Sorry, I misread :) A lien per controller or per reason doesn't matter from the perspective of the lien mechanism, it's up to controller authors how they want to use the mechanism.

lavalamp avatar Aug 04 '21 19:08 lavalamp

@lavalamp

You can prototype today with e.g. specially formatted annotations and a webhook admission controller.

I've implemented a prototype of lien as this. It is separated to 4 commits, but only 2 commits, the first commit and the last commit, will be the points of interest.

I will update the KEP based on this prototype.

Note that I've tested below way and work as shown below:

  • Deploy:
ENABLE_ADMISSION_PLUGINS=Lien hack/local-up-cluster.sh 
  • Test:
  1. Without liens, deletion is not blocked
kubectl create secret generic test-secret --from-literal='username=my-app' --from-literal='password=39528$vdg7Jb'
kubectl get secret test-secret -o jsonpath='{.metadata.liens}{"\n"}'

kubectl delete secret test-secret
secret "test-secret" deleted
  1. With liens, deletion is blocked, and once all liens are removed, deletion is not blocked
kubectl create secret generic test-secret --from-literal='username=my-app' --from-literal='password=39528$vdg7Jb'
kubectl patch secret test-secret -p '{"metadata":{"liens":["never delete me"]}}' --type=merge
secret/test-secret patched

kubectl get secret test-secret -o jsonpath='{.metadata.liens}{"\n"}'
[never delete me]

kubectl delete secret test-secret
Error from server (Forbidden): secrets "test-secret" is forbidden: deletion not allowed by liens

kubectl patch secret test-secret -p '{"metadata":{"liens":[]}}' --type=merge
secret/test-secret patched

kubectl delete secret test-secret
secret "test-secret" deleted

mkimuram avatar Aug 13 '21 23:08 mkimuram

@lavalamp

I've updated the KEP. PTAL

Also, I will upate the prototype of secret-protection to rely on this in-use protection mechanism to check the feasibility and also update the KEP for secret-protection.

mkimuram avatar Aug 16 '21 18:08 mkimuram

This is close enough that maybe you can come to an API Machinery SIG meeting and pitch it to us? :)

lavalamp avatar Aug 17 '21 23:08 lavalamp

@lavalamp

Thank you for your review and feedback. I will update the KEP.

This is close enough that maybe you can come to an API Machinery SIG meeting and pitch it to us? :)

Of course. Thank you for your invite. (It seems that the meeting is conflicting with that of SIG storage, however, this KEP will need to be merged before secret-protection KEP, so I will prioritize this KEP).

mkimuram avatar Aug 18 '21 00:08 mkimuram

@lavalamp

Updated the KEP. PTAL (For rephrasing "validating admission webhook", used common phrase "validation in api-server".)

mkimuram avatar Aug 18 '21 19:08 mkimuram

mentioned this KEP in #sig-cli xref https://github.com/kubernetes/enhancements/pull/2777

neolit123 avatar Aug 25 '21 19:08 neolit123

@detiber @neolit123

Thank you for your review. I've addressed the review comments. PTAL

Also, I changed the format for Liens field. Please also check it (In my understanding, content of a map needs to have a unique identifier, therefore I used LienID as the identifier. However, it makes almost same as just a map of list of string for "Per Object" case). Any ideas to improve it?

mkimuram avatar Aug 27 '21 20:08 mkimuram

@johnbelamaric

PRR approval deadline is approaching. Could you apporve it? If there are anything missing for PRR approval, please point out.

mkimuram avatar Sep 01 '21 16:09 mkimuram

@wojtek-t @thockin cc: @lavalamp @detiber @neolit123 @msau42

Updated the KEP to change back to slice-of-strings Liens and restrict it to per-controller/per-user, to be more aligned with Finalizers. PTAL

mkimuram avatar Sep 07 '21 21:09 mkimuram

I think it's important to work out what kind of permissions should be required in order to add lien. The KEP as currently written allows a namespace admin or editor to create an object that cannot be deleted. The impact here is greater than finalizers because a finalizer doesn't block the deletion itself. Once the deletionTimestamp is set, a finalizer can be removed and cannot be added back. This gives a non-racy path to having an object removed even after finalizers are set.

In contrast, liens would prevent deletion, so a cluster-admin would be racing to update the object to remove the lien and delete the object before a namespace editor is able to place the lien back.

One of our regrets with finalizers is the lack of separation between managing finalizers and updating the intent of the resources themselves. I think we should avoid making that mistake again and create a separate ACL model (possibly a subresource) to modify liens. I do not think that a namespace admin or editor should be able to create an object that a cluster-admin or namespace lifecycle controller cannot delete.

deads2k avatar Sep 08 '21 18:09 deads2k

@deads2k do you think a subresource (or every resource!!) is a requirement for alpha? Is there (or should there be) a way to add a universal subresource?

thockin avatar Sep 08 '21 19:09 thockin

@deads2k

Could you comment on whether making lien a subresource will be a requirement for alpha, as asked here?

I've added it as a "Beta graduation criteria" with describing the risk to the KEP. Can we go ahead with it? Actually, I don't have enough knowledge on whether we can change it after alpha without a big side effect, so I will wait for your decision.

mkimuram avatar Sep 08 '21 23:09 mkimuram

@deads2k do you think a subresource (or every resource!!) is a requirement for alpha? Is there (or should there be) a way to add a universal subresource?

+1 I think we agree that the mitigation for the risk of "undelatable resources" is a dedicate ACL (subresource). So given we know the solution here, that doesn't sound like an Alpha-blocker for me (it clearly is Beta blocker).

Or do you think think that "generic subresource for even resource" is the tricky part here?

@deads2k - I'm personally supportive for this going alpha in 1.23 with the current risk. So if you believe that the race problem is solvable, I think we should go with it (unless you're not confident that we can reasonable solve the dedicate ACL one).

wojtek-t avatar Sep 09 '21 09:09 wojtek-t

@deads2k do you think a subresource (or every resource!!) is a requirement for alpha? Is there (or should there be) a way to add a universal subresource?

I do think that we need to resolve permissions situations before creating the alpha level of any feature. In addition, this particular one is build on ObjectMeta is effectively an unversioned API that spans all resources. I think this raises the stakes for getting the initial implementation correct.

The closest current parallel is finalizers and we have never been able to (and probably never will be able) to go back and correct those short-comings.

deads2k avatar Sep 09 '21 15:09 deads2k

@deads2k

Thank you for your comment. I understand that making lien field a subresource needs to happen in the first implementation.

Then, the next question is whether we can go ahead by just modifying this KEP to make the lien field a subresource to allow defining a separate RBAC in alpha. Or do we need anything else?

I'm not sure that making a field in metadata a subresource is possible and reasonable, nor making slice-of-string lien a subresource alone can really mitigate the risk. I would appreciate it if you could let me know the direction (I agree to skip this release if there is a blocker that won't be solved in time).

mkimuram avatar Sep 09 '21 15:09 mkimuram

I think that permissions issues could be resolved by a subresource with separate permissions segregated from admin,editor.

I think the guarantees around "blocks deletion" still present a problem overall and may require a way to say, "ignore liens and force deletion" to allow, "no matter what someone else says, this needs to be removed" similar to how deletion+removing finalizers does today.

A design update containing those changes seems significant enough that it's unlikely to complete review today.

deads2k avatar Sep 09 '21 16:09 deads2k

@deads2k

A design update containing those changes seems significant enough that it's unlikely to complete review today.

Thank you for your comment. Then, we need to skip this release (I've updated the KEP to just use subresource, but it seems that it won't be enough).

Let's continue discussion to merge it for k8s 1.24, later. Thank you for taking your time to review this KEP, everyone involved. I really appreciate it.

mkimuram avatar Sep 09 '21 17:09 mkimuram

Let's continue discussion to merge it for k8s 1.24, later. Thank you for taking your time to review this KEP, everyone involved. I really appreciate it.

Let's just keep traction on it - we should continue discussion in the following days, not get back to it right before the next feature freeze.

wojtek-t avatar Sep 09 '21 18:09 wojtek-t

Sorry we couldn't wrap this one up this time around.

I think there's some basic tension around whether this should be an advisory feature, to help prevent accidental deletion, or a hard locking feature which sets privileges around who can or cannot lock / delete.

I'm rooting for an advisory role for liens, which would help a longstanding feature request to make it hard to delete something accidentally. This would imply that anyone can apply a lien, and anyone can ignore them (you just have to kubectl --force --with-crowbar delete -- since we already used plain old --force for something else).

OTOH if you want a runtime referential integrity feature, then we have to permission the addition and removal of liens to make sure the wrong parties can't break the referential integrity. This sounds very complicated to me and I don't really want to do it if we don't have to.

lavalamp avatar Sep 09 '21 23:09 lavalamp

Thank you for sharing the alternative idea. Advisory feature or hard locking feature seems a very good summary of the choices.

I think that it will be also a valid direction to mitigate the risk. If we decide to choose this direction, it can be added even after alpha without API changes, so it won't be an alpha blocker. I will update the KEP and wait for others to agree.

(Actually, my original intention is to prevent accidental deletion, therefore this change is not just to make this KEP merged in time, but it fits my use case).

mkimuram avatar Sep 10 '21 00:09 mkimuram