kpt-config-sync
kpt-config-sync copied to clipboard
Don't set owner annotations on implicit namespaces
We have a setup where we have multiple RootSyncs fighting for control over the same namespace. Simplified, the setup looks something like this:
- A
RootSyncnamedsync-adeclaresnamespace-aexplicitly - A
RootSyncnamedsync-xdeclares resources innamespace-a, but notnamespace-aitself - A
RootSyncnamedsync-yalso declares resources innamespace-a, but notnamespace-aitself - A
RootSyncnamedrootdeclares all the aboveRootSyncs
Since we create all of sync-{a,x,y} with git-ops (managed by root) we don't have any control of the order in which they are reconciled. If either of sync-x or sync-y manage to create namespace-a before sync-a does, then they will assume ownership, and sync-a will no longer be allowed to create it (and as a result, all its dependent resources also fail to be created).
This patch makes sync-x and sync-y create the namespace if it does not exist, but without assuming ownership of it. That way, when sync-a later comes along to create it, it can instead assume ownership of the existing namespace, without any resource fight errors.
Hi @tomasaschan. Thanks for your PR.
I'm waiting for a GoogleContainerTools member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/ok-to-test
This will have the side effect that these implicit namespaces will no longer be protected from deletion/update by the admission webhook. Is this desired behavior?
Preventing update seems perhaps bad if they're not in Git, but preventing delete sounds good because there's managed content in the namespace.
We might need a special annotation and special case in the webhook that prevents deletion but allows updates.
/cc @janetkuo @nan-yu for behavior discussion.
/hold
Until behavior consensus is reached.
cc @haiyanmeng
I've dug through the code to figure out how this webhook is implemented, and I think I understand enough to take a stab at an implementation that handles this better. However, I'm not sure quite what you're asking this should do.
For example, consider the case we have with multiple different syncs all using the same implicit namespace. When I delete the last resource in this namespace in one sync, it should obviously not remove the namespace if there are still resources in it managed by other syncs - but what if I delete the last (managed) resource in there? Is it important that the implicit namespace is cleaned up when it's not used by any sync anymore? If so, whatever solution we come up with would have to ensure all syncs that add resources to it also note that they do, and that when they remove their last resource they indicate that clearly, so that the last sync that deletes resources can also see that it should remove the namespace.
I'm thinking e.g. an annotation configsync.k8s.io/implicitly-used-by with a list of managers; each sync would be responsible for adding itself to this list when it adds resources, and also for removing itself from the list when it removes its last resource. If it removes its last resource and the list doesn't contain any other managers, it can remove the namespace.
But all of this bookkeeping becomes quite complex, especially in light of race conditions etc. It would be much simpler to just add an configsync.k8s.io/implicit: "true" annotation whenever the namespace is created implicitly (and remove it if it's created explicitly), and use that in the webhook to make the call about whether to allow updates/deletes or not (allow updates, disallow deletes). When the last resource is removed, nothing happens, and the namespace is left laying around; if the user wants to delete it, they will have to first remove the annotation (which works, because the webhook allows updates). Not as safe, not as clean, but much easier to implement.
WDYT?
Hi @tomasaschan,
I put together a public design doc which puts forth a proposal to solve this problem. Please feel free to comment/review.
@sdowell Thanks for putting that together! That should be easy enough to get going with!
I have one question though (which I'll post here since I have view-only access to the doc): toward the very end of the document, you describe the case for deleting a managed namespace that contains resources managed by a different sync that uses the namespace implicitly, with the following behavior:
If the admission webhook is enabled:
- The namespace is pruned but not reconciled since there are managed resources in the namespace.
- The user can proceed with namespace deletion by pruning resources from
sync-x(and any otherRootSyncswhich manage resources in the same namespace).
IIUC, because the namespace has already been marked for deletion, there is no option for the user to actually keep the resources managed by sync-x - so if deleting and recreating those resources is a destructive operation (say, dropping and recreating a database), the user doesn't have a way to avoid that. Is this a problem?
Another conceivable option would be for syncs using a namespace implicitly to add (but never remove) an annotation for client.lifecycle.config.k8s.io/deletion: detach.
@tomasaschan Added you as a commenter on the doc. Would you mind copying your comment there so we can discuss in a thread?
@sdowell I think my concerns have been raised in comment threads by @janetkuo and @nan-yu; will weigh in on those!
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign janetkuo for approval by writing /assign @janetkuo in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@tomasaschan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| kpt-config-sync-presubmit-e2e-multi-repo-test-group2 | 7d4ca551905d586a8d9bf8e7ef2b9e51fba4dcac | link | true | /test kpt-config-sync-presubmit-e2e-multi-repo-test-group2 |
| kpt-config-sync-presubmit-e2e-multi-repo-test-group3 | 7d4ca551905d586a8d9bf8e7ef2b9e51fba4dcac | link | true | /test kpt-config-sync-presubmit-e2e-multi-repo-test-group3 |
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
I don't think we can accept this proposal, due to reverse incompatible behavior.
As discussed in the proposal doc, the way forward is to add a feature to disable implicit namespace creation all together. This will need to be opt-out to avoid breaking reverse compatibility.
If you want to take a stab at a PR for that, @tomasaschan, feel free. Otherwise, I'll put it on our backlog for prioritization.