linkerd2
linkerd2 copied to clipboard
Feature: Add lifecycle hooks to pods from jobs automatically
What problem are you trying to solve?
When using linkerd
to inject everything inside a cluster, pods spawned from jobs fall into a NotReady
state as the main container inside the pod has completed its task but the proxy
runs forever.
Additionally, it is impossible to use defaultAllowPolicy: "cluster-authenticated"
without injecting jobs
because they will not be able to communicate with the relevant things inside the mesh.
Slack Threads:
- policy - https://linkerd.slack.com/archives/C89RTCWJF/p1646315631241249
- job/await discussion - https://linkerd.slack.com/archives/C89RTCWJF/p1646235837693339
How should the problem be solved?
When a pod
is spawned which belongs to a job
/ cronjob
the pod
should have a lifecycleHook automatically injected to run curl -X POST http://localhost:4191/shutdown
or equivalent to ensure the container running the work terminates the proxy
.
Additionally, it could be beneficial to have an annotation that could configure the lifecycleHook, for example:
annotations:
config.linkerd.io/lifecycle-hook-enabled: "true"
config.linkerd.io/lifecycle-hook-binary: "wget" # could also be curl or others
Any alternatives you've considered?
Configuring a bunch of policies cluster wide to enable jobs
to work whilst 99% of other traffic is authed and through the mesh. Ideally, getting fresh clusters onboarded would be pretty quick and painless where possible for many users.
Additionally, I've considered adding the hook
myself to my objects but some of them are spawned via third-party charts which don't provide a clean interface to add these relevant hooks. I would have to resort to kustomize
to add the lifecycle hook for each job within the cluster that needs to communicate to things on the mesh
How would users interact with this feature?
They could configure it via annotations
that are read by the injection webhook which vary the output slightly (curl vs wget vs other) and would be able to enable/disable the hook injection aswell as the injection of the proxy
Would you like to work on this feature?
No response
If I understand correctly, lifecycle hooks can't actually do this. From https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution:
PreStop
This hook is called immediately before a container is terminated due to an API request or management event such as a liveness/startup probe failure, preemption, resource contention and others.
That is, lifecycle hooks don't apply when a container exits gracefully. They only apply when Kubelet decides to terminate a container; and if Kubelet is deciding to terminate the Job, the proxy will shutdown gracefully.
I think the only real approach to solving this problem would be to write a controller that deletes jobs when the linkerd proxy is the only running container.
So i did look at this: https://itnext.io/three-ways-to-use-linkerd-with-kubernetes-jobs-c12ccc6d4c7c
I thought it was kind of neat to do this, but the cleanest and easiest way would be to have a controller like you say, since if I have numerous jobs to configure in numerous places it just becomes tedious to manage. Having a controller constantly check to auto-cleanup would be useful
Another option would be to make the linkerd proxy container (either the binary directly or another process therein) aware of the state of the pod by polling the kubernetes API.
Instead of having a central controller polling the state of the pods each pod would poll its own state and terminate its own proxy.
Advantage would be that there is no controller installation needed. There could also be less resource usage when there are no jobs running. I think it should also perform better when there are significantly less job pods than other pods which is probably more common.
I am not sure though if the default service account has permissions for that or if linkerd can possibly inject those, but I believe it could.