kapp-controller
kapp-controller copied to clipboard
Enable kapp-controller to delete App when namespace is deleted
Description
Deleting namespace gets stuck in Terminating state as kapp-controller won't delete an application without valid service account
What steps did you take:
- Created a minikube cluster on local machine (I used k8s version
1.26.1) - Installed kapp-controller using:
kubectl apply -f https://github.com/carvel-dev/kapp-controller/releases/download/v0.44.6/release.yml
- Create a
devnamespace
kubectl create ns dev
- Create a sample config map to use later while creating an app, and a serviceaccount and rolebindings
kubectl apply -f https://raw.githubusercontent.com/atmandhol/tap-nsp-gitops/main/random/app/app-permissions.yaml
kubectl apply -f https://raw.githubusercontent.com/atmandhol/tap-nsp-gitops/main/random/app/app-config.yaml
- Create an
App
kubectl apply -f https://raw.githubusercontent.com/atmandhol/tap-nsp-gitops/main/random/app/app.yaml
I now have an App that is successfully deployed.
Next, I attempt deletion of the namespace as I no longer need it
kubectl delete ns dev
What happened: When I ran the delete command
- ServiceAccount
default-ns-sathat I was using for the app is deleted as part of the namespace deletion - Namespace
devgets stuck in theTerminatingstate until theAppis deleted. Appgets stuck and fails to delete with the following error
Preparing kapp: Getting service account: serviceaccounts "default-ns-sa" not found
What did you expect:
If the namespace in which the App resides is in Terminating state, kapp-controller should delete the App and should not need the ServiceAccount that the App was using as the user/some process is clearly signaling that it is not needed (because user/some process deleted the namespace).
Anything else you would like to add:
In large scale kubernetes deployments where there are developer portals or some automated processes that creates and deletes namespaces for users, It's not going to be feasible for operators to manually inspect namespaces and delete all existing Apps before deleting the namespace itself.
Environment:
- kapp Controller version
0.44.6 - Kubernetes version
1.26.1
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.
Copied from https://github.com/carvel-dev/kapp-controller/issues/416#issuecomment-1461101291:
Should Carvel Apps set metadata.ownerReferences on the App to include the referenced ServiceAccount with blockOwnerDeletion to prevent deleting the Service Account before the Carvel App is deleted?
Should Carvel Apps set
metadata.ownerReferenceson the App to include the referenced ServiceAccount withblockOwnerDeletionto prevent deleting the Service Account before the Carvel App is deleted?
An App with a ServiceAccount without permissions from a deleted RoleBinding will also get stuck.
An admittedly broad and slightly scary solution would be to use a dedicated ServiceAccount from the kapp-controller namespace that has permission to delete any resource in the cluster. When an App is deleted, use that common SA rather than the SA referenced by the App resource.
The ClusterRole would have this rule (kapp may need additional permissions to find resources within the App):
- apiGroups:
- "*"
resources:
- "*"
verbs:
- delete
If a single ServiceAccount that can delete any resource is too powerful, the ClusterRole could be set up as an aggregating role and users could contribute specific apiGroups/resources they want deleted Apps to be able to delete.
Since KC is behaving the way it is supposed to when a service account is deleted before the app, marking it as an enhancement.
@praveenrewar please reopen as #1208 was self described as a partial fix
@scothis Thanks, I didn't notice that the issue got linked when I mentioned it in the PR description.
This issue is being marked as stale due to a long period of inactivity and will be closed in 5 days if there is no response.
I'm seeing this as well in Tanzu Application Platform clusters:
When deleting a developer namespace, they get stuck in Terminating due to this reason.