linkerd2 icon indicating copy to clipboard operation
linkerd2 copied to clipboard

Feature: Add lifecycle hooks to pods from jobs automatically

Open jack1902 opened this issue 2 years ago • 9 comments

What problem are you trying to solve?

When using linkerd to inject everything inside a cluster, pods spawned from jobs fall into a NotReady state as the main container inside the pod has completed its task but the proxy runs forever.

Additionally, it is impossible to use defaultAllowPolicy: "cluster-authenticated" without injecting jobs because they will not be able to communicate with the relevant things inside the mesh.

Slack Threads:

  • policy - https://linkerd.slack.com/archives/C89RTCWJF/p1646315631241249
  • job/await discussion - https://linkerd.slack.com/archives/C89RTCWJF/p1646235837693339

How should the problem be solved?

When a pod is spawned which belongs to a job / cronjob the pod should have a lifecycleHook automatically injected to run curl -X POST http://localhost:4191/shutdown or equivalent to ensure the container running the work terminates the proxy.

Additionally, it could be beneficial to have an annotation that could configure the lifecycleHook, for example:

annotations:
  config.linkerd.io/lifecycle-hook-enabled: "true"
  config.linkerd.io/lifecycle-hook-binary: "wget" # could also be curl or others

Any alternatives you've considered?

Configuring a bunch of policies cluster wide to enable jobs to work whilst 99% of other traffic is authed and through the mesh. Ideally, getting fresh clusters onboarded would be pretty quick and painless where possible for many users.

Additionally, I've considered adding the hook myself to my objects but some of them are spawned via third-party charts which don't provide a clean interface to add these relevant hooks. I would have to resort to kustomize to add the lifecycle hook for each job within the cluster that needs to communicate to things on the mesh

How would users interact with this feature?

They could configure it via annotations that are read by the injection webhook which vary the output slightly (curl vs wget vs other) and would be able to enable/disable the hook injection aswell as the injection of the proxy

Would you like to work on this feature?

No response

jack1902 avatar Mar 04 '22 11:03 jack1902

If I understand correctly, lifecycle hooks can't actually do this. From https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution:

PreStop

This hook is called immediately before a container is terminated due to an API request or management event such as a liveness/startup probe failure, preemption, resource contention and others.

That is, lifecycle hooks don't apply when a container exits gracefully. They only apply when Kubelet decides to terminate a container; and if Kubelet is deciding to terminate the Job, the proxy will shutdown gracefully.

I think the only real approach to solving this problem would be to write a controller that deletes jobs when the linkerd proxy is the only running container.

olix0r avatar Mar 08 '22 18:03 olix0r

So i did look at this: https://itnext.io/three-ways-to-use-linkerd-with-kubernetes-jobs-c12ccc6d4c7c

I thought it was kind of neat to do this, but the cleanest and easiest way would be to have a controller like you say, since if I have numerous jobs to configure in numerous places it just becomes tedious to manage. Having a controller constantly check to auto-cleanup would be useful

jack1902 avatar Mar 10 '22 09:03 jack1902

Another option would be to make the linkerd proxy container (either the binary directly or another process therein) aware of the state of the pod by polling the kubernetes API.

Instead of having a central controller polling the state of the pods each pod would poll its own state and terminate its own proxy.

Advantage would be that there is no controller installation needed. There could also be less resource usage when there are no jobs running. I think it should also perform better when there are significantly less job pods than other pods which is probably more common.

I am not sure though if the default service account has permissions for that or if linkerd can possibly inject those, but I believe it could.

mladedav avatar Apr 30 '22 08:04 mladedav