Handle eraser pod and PodTemplate deletion on controller manager restart
What steps did you take and what happened: [A clear and concise description of what the bug is.] AKS ImageCleaner user raised issue: If eraser-controller-manager pod is restarted, the ImageJob is cleaned up but the PodTemplate and eraser pods are left behind (since controller-manager-pod is not deleted in this case). In this example, manager pod restarted twice and two PodTemplates are left behind:
... eraser-aks-upools1z3-40582422-vmss00003t-7wsdb 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003t-d5hbv 0/3 Completed 0 2d20h eraser-aks-upools1z3-40582422-vmss00003u-ckw2s 0/3 Completed 0 2d20h eraser-aks-upools1z3-40582422-vmss00003u-x8sg7 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003v-2hgsf 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003v-5wbp6 0/3 Completed 0 2d20h eraser-controller-manager-7bcc57b846-n52f7 1/1 Running 2 (43h ago) 3d20h
...
kubectl get imagejob -A No resources found
kubectl get podtemplate -A
NAMESPACE NAME CONTAINERS IMAGES POD LABELS
kube-system imagejob-dhtbn collector,remover,trivy-scanner mcr.microsoft.com/oss/eraser/collector:v1.2.2,mcr.microsoft.com/oss/eraser/remover:v1.2.2,mcr.microsoft.com/oss/eraser/eraser-trivy-scanner:v1.2.2
What did you expect to happen: PodTemplates and Pods should be cleaned up if manager restarts, not just when it is deleted.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Eraser version: v1.2.2
- Kubernetes version: (use
kubectl version):
cc @WilliamRockwellEvans
since this is not affecting the functioning of Eraser, we will not be prioritizing this at the moment. As a fix, the resources remaining can be cleaned up manually.
If anyone would like to pick this up, we would need to modify the controllers (imagecollector_controller and imagelist_controller) in order to check for restart status on reconciles, and trigger podTemplate and pod cleanups on delete.
I can't reproduce this issue using eraser v1.4.0:
~ ❯ k delete po -n eraser-system --all
pod "eraser-controller-manager-c798f59f6-sk92t" deleted
~ ❯ k get podtemplates -A
NAMESPACE NAME CONTAINERS IMAGES POD LABELS
eraser-system imagejob-sxpjt collector,remover,trivy-scanner ghcr.io/eraser-dev/collector:v1.4.0,ghcr.io/eraser-dev/remover:v1.4.0,ghcr.io/eraser-dev/eraser-trivy-scanner:v1.4.0 <none>
~ ❯ k get pods -n eraser-system
NAME READY STATUS RESTARTS AGE
eraser-controller-manager-c798f59f6-zd2g8 0/1 Running 0 10s
eraser-kind-control-plane-cn2mn 0/3 Completed 0 7s
~ ❯ k get pods -n eraser-system
NAME READY STATUS RESTARTS AGE
eraser-controller-manager-c798f59f6-zd2g8 1/1 Running 0 15s
~ ❯ k get podtemplates -A
No resources found