eraser icon indicating copy to clipboard operation
eraser copied to clipboard

Handle eraser pod and PodTemplate deletion on controller manager restart

Open ashnamehrotra opened this issue 2 years ago • 3 comments

What steps did you take and what happened: [A clear and concise description of what the bug is.] AKS ImageCleaner user raised issue: If eraser-controller-manager pod is restarted, the ImageJob is cleaned up but the PodTemplate and eraser pods are left behind (since controller-manager-pod is not deleted in this case). In this example, manager pod restarted twice and two PodTemplates are left behind:

... eraser-aks-upools1z3-40582422-vmss00003t-7wsdb 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003t-d5hbv 0/3 Completed 0 2d20h eraser-aks-upools1z3-40582422-vmss00003u-ckw2s 0/3 Completed 0 2d20h eraser-aks-upools1z3-40582422-vmss00003u-x8sg7 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003v-2hgsf 0/3 Completed 0 43h eraser-aks-upools1z3-40582422-vmss00003v-5wbp6 0/3 Completed 0 2d20h eraser-controller-manager-7bcc57b846-n52f7 1/1 Running 2 (43h ago) 3d20h

...


kubectl get imagejob -A No resources found


kubectl get podtemplate -A NAMESPACE NAME CONTAINERS IMAGES POD LABELS kube-system imagejob-dhtbn collector,remover,trivy-scanner mcr.microsoft.com/oss/eraser/collector:v1.2.2,mcr.microsoft.com/oss/eraser/remover:v1.2.2,mcr.microsoft.com/oss/eraser/eraser-trivy-scanner:v1.2.2 kube-system imagejob-r5vlc collector,remover,trivy-scanner mcr.microsoft.com/oss/eraser/collector:v1.2.2,mcr.microsoft.com/oss/eraser/remover:v1.2.2,mcr.microsoft.com/oss/eraser/eraser-trivy-scanner:v1.2.2

What did you expect to happen: PodTemplates and Pods should be cleaned up if manager restarts, not just when it is deleted.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Eraser version: v1.2.2
  • Kubernetes version: (use kubectl version):

ashnamehrotra avatar Nov 13 '23 22:11 ashnamehrotra

cc @WilliamRockwellEvans

ashnamehrotra avatar Nov 15 '23 21:11 ashnamehrotra

since this is not affecting the functioning of Eraser, we will not be prioritizing this at the moment. As a fix, the resources remaining can be cleaned up manually.

If anyone would like to pick this up, we would need to modify the controllers (imagecollector_controller and imagelist_controller) in order to check for restart status on reconciles, and trigger podTemplate and pod cleanups on delete.

ashnamehrotra avatar Mar 31 '25 18:03 ashnamehrotra

I can't reproduce this issue using eraser v1.4.0:

~ ❯ k delete po -n eraser-system --all
pod "eraser-controller-manager-c798f59f6-sk92t" deleted
~ ❯ k get podtemplates -A
NAMESPACE       NAME             CONTAINERS                        IMAGES                                                                                                                 POD LABELS
eraser-system   imagejob-sxpjt   collector,remover,trivy-scanner   ghcr.io/eraser-dev/collector:v1.4.0,ghcr.io/eraser-dev/remover:v1.4.0,ghcr.io/eraser-dev/eraser-trivy-scanner:v1.4.0   <none>
~ ❯ k get pods -n eraser-system
NAME                                        READY   STATUS      RESTARTS   AGE
eraser-controller-manager-c798f59f6-zd2g8   0/1     Running     0          10s
eraser-kind-control-plane-cn2mn             0/3     Completed   0          7s
~ ❯ k get pods -n eraser-system
NAME                                        READY   STATUS    RESTARTS   AGE
eraser-controller-manager-c798f59f6-zd2g8   1/1     Running   0          15s
~ ❯ k get podtemplates -A
No resources found

sozercan avatar Apr 03 '25 19:04 sozercan