consul-k8s
consul-k8s copied to clipboard
Adding support for lifecycle hooks and health probe for sidecars
Changes proposed in this PR:
-
Adding support to start main app containers after sidecar is started. Done by adding a postStart lifecycle hook to the sidecar container. (via boolean annotation: consul.hashicorp.com/sidecar-hold-app-until-proxy-starts) With reference from: https://github.com/istio/istio/issues/11130#issuecomment-645424556
-
Adding support to stop sidecar gracefully after main app container stops. Done by adding a preStop lifecycle hook to the sidecar container. (via boolean annotation consul.hashicorp.com/sidecar-proxy-graceful-shutdown) With reference from: https://github.com/istio/istio/issues/7136
-
Support to add health probes. Done by adding Liveness check to the sidecar container. (via boolean annotation consul.hashicorp.com/sidecar-configure-probes)
One impact due to this:
On adding sidecar-hold-app-until-proxy-starts
, the sidecar becomes the default container when using kubectl exec
These features are added as opt in via annotations
How I've tested this PR:
- Added the annotations to a k8s deployment.
- Post adding could see postStart, preStop and liveness gets added to the deployment manifest
- Tested holding app container startup by running continuous calls to sidecar on startup from app. Working as expected. Without the annotation the call to sidecar fails as the app container starts up first.
- Tested graceful shutdown by adding a preStop to the app container. Works as expected. Without the annotation the envoy sidecar gets terminated immediately while the app is still completing its preStop and all calls to the sidecar start failing. This is running in our environment perfectly from quite some time. https://github.com/hashicorp/consul-k8s/issues/536#issuecomment-1211722794
- On adding annotation for liveness it comes up correctly in the pod manifest. This is useful for mesh deployments that do not yet use transparent proxies
- The annotation expects only boolean value. Any other value would skip adding the annotation functionality.
How I expect reviewers to test this PR:
Checklist:
- [ ] Tests added
- [ ] CHANGELOG entry added
HashiCorp engineers only, community PRs should not add a changelog entry. Entries should use present tense (e.g. Add support for...)
Hi @narendrapatel have you been able to test and see that this works on Consul K8s?
Hi @narendrapatel have you been able to test and see that this works on Consul K8s?
@david-yu yes I have tested this functionality in my local K8s cluster.
Hi @narendrapatel thanks for the PR, I initially thought this was going to be a PR to address health probes for sidecars but I am as able to see the whole picture here a bit better now that the whole PR is put up. It looks like the use case is to address a few different proxy lifecycle management use cases.
The Consul K8s team is intending to support a few different proxy lifecycle management use cases after our 1.14 release, however we can't merge this PR as is since we will be making significant architecture changes in how the proxy is deployed and configured as part of that release. I'll leave this PR open for the time being so others can use this for their own purposes as well.
@david-yu thanks for the sharing.
I don't it was mentioned in the 1.14 release doc. Maybe I am overlooking something?
https://github.com/hashicorp/consul/releases/tag/v1.14.0
I've updated the PR to be compatible with consul version v1.14 and consul control plane 1.0.0. But with the consul dataplane now using distroless base container, we'll need to add couple of binaries like sh
, wget
, sleep
, netstat
, grep
and wc
. The docker file for consul dataplane is available here. Also dropped use of annotation consul.hashicorp.com/sidecar-configure-probes
from the PR since with consul control plane 1.0.0 readiness and liveness checks are added by default. So the PR now only covers graceful start up and shut down for the sidecar.
Change required for consul version < 1.14 and consul k8s control plane < 1.0.0 is here
Hi, Is this ever going to get merged? This functionality is required by many consumers, due to the non-graceful shutdown of services in the mesh. Thanks
Any news on that? We could highly benefit from such functionality.
Closing as this is now resolved by https://github.com/hashicorp/consul-k8s/pull/2233 and will be released in upcoming patch releases for 1.0.x, 1.1.x and 1.2.x.
Thanks everyone, this was a critical fix for us.
@david-yu it seems #2233 doesn't address the startup issue fixed by this PR (Envoy proxy sidecar goes up only after the app container). Without a fix Consul Connect is pretty much unusable for us, we wished a fix would make its way to the just-released 1.2 version. What's the reason to not fix both lifecycle issues at once? Since this PR is now closed is there any other place to track the startup issue progress?
Hi @psypuff We really wanted to get all proxy lifecycle features in but needed to stagger them out due to the amount of work that was scheduled for Consul 1.16.0 and Consul K8s 1.2. Just as a clarification, the fix for pod shut down will come in patch releases of Consul K8s 1.1.x and 1.0.x that are releasing today, but for Consul K8s 1.2.x we will deliver them in 1.2.1.
The official issue tracking application startup scenarios is https://github.com/hashicorp/consul-k8s/issues/1397. I'll also link this closed PR there as well for folks to reference if they need a workaround.