builder
builder copied to clipboard
Builder pods not removed after deploy
Currently (as of deis-builder v2.7.1) the slugbuild and dockerbuild pods are not deleted after a successful or failed build.
This means that the pod (eg. slugbuild-example-e24fafeb-b31237bb) will continue to exist in state "Completed" or state "Error" and the docker container associated with the pod can never be garbage collected by Kubernetes, causing the node to quickly run out of disk space.
Example:
On a k8s node with an uptime of 43 days and 95 GB disk storage for docker there where 249 completed (or some erred) slugbuild and dockerbuild pods whose docker images accounted for 80 GB of disk storage, while the deployed apps and deis services only required 15 GB storage.
Expected Behavior:
The expected behavior for the builder would be, that it automatically deletes the build pod after is has completed or erred, so that the K8s garbage collection can remove the docker containers which frees the disk space allocated to them.
This behavior can easily inspected with:
kubectl get --namespace deis --show-all pods | grep build-
The number of completed pods will increase by one for each build.
related: https://github.com/deis/builder/issues/57
This seems like in recent versions of k8s, they stopped cleaning up pods in the "success" state. Probably some research needs to be done on how to turn this functionality back on.
I'm running K8s 1.4.x if that matters.
Regarding #57 suggestion for Jobs – neither Jobs nor Pods are removed automatically.
From the K8s Job docs:
When a Job completes, no more Pods are created, but the Pods are not deleted either. Since they are terminated, they don’t show up with kubectl get pods, but they will show up with kubectl get pods -a. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml). When you delete the job using kubectl, all the pods it created are deleted too.
Interestingly the docs on Pod Lifecycle say:
In general, Pods do not disappear until someone destroys them. This might be a human or a controller. The only exception to this rule is that Pods with aphase of Succeeded or Failed for more than some duration (determined by the master) will expire and be automatically destroyed.
This seems to be in contrast to what I'm actually seeing…
I have opened kubernetes/kubernetes#41787 for clarification of the above statement from the docs.
I just got feedback to the kubernetes issue, it looks like by default completed or failed pods are garbage collected if there are more than 12,500 pods. Obviously that is not very helpful in this case, so an automatic cleanup by the builder should be implemented.
Quoting here from the kube-controller-manager help on the --terminated-pod-gc-threshold <n> option:
Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled. (default 12500)
Any progress on this ? Sounds like a waste of resources and space for everyone.
Same here, it may be linked to a issue I've opened last week.
$ kubectl get --namespace deis --show-all pods | grep build-
slugbuild-teslabit-web-production-d2fcd4c0-7e507178 0/1 Completed 0 1d
I'm using this tiny git pre-push hook for deletion https://gist.github.com/pfeodrippe/116c8b570ee2ffcdce8aa15bbae5a22b.
It deletes the last slugbuild created for the app when you git push
+1 This bit me after a couple of weeks of deploying applications to my deis cluster.
This issue was moved to teamhephy/builder#17