consul-k8s
consul-k8s copied to clipboard
Consul Connect: Application is unable to access other services until sidecar container is fully initialized
Question
I am using consul connect auto-injection.
If the sidecar container with envoy is not fully configured yet but the main application already issues a request over the network this request will fail and my application will not start.
I am using this workaround in my entrypoint.sh:
until [ $(curl --fail --silent --output /dev/stderr --write-out "%{http_code}" localhost:19000/ready) -eq 200 ]; do
echo Waiting for proxy...
sleep 1
done
With Istio there is the possibility to automatically add lifecycle hooks to the pod which take care of only letting the main container start when the istio sidecar container has sucessfully started.
This configuration option is called holdApplicationUntilProxyStarts
- see: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/
this can also be achieved using an annotation on the pod:
annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
I have done some research but I am unable to find if an equivalent of this configuration is also available for consul connect.
Is there something like this available? Any help would be greatly appreciated, THANKS!
Environment details
consul version: 1.12.0
Hi @philslab-ninja - I do not believe that we have any hooks at the webhook level that would allow you to inject probes, but it looks like you could get away with adding a readinessProbe to your application which uses the same url that you mentioned in the issue?
It might look something like this:
readinessProbe:
httpGet:
path: /ready
port: 19000
failureThreshold: 1
periodSeconds: 2 # (or some reasonable number)
Would that work for now? We have had some requests in the past to add the ability to inject lifecycle hooks through the injector but have not gotten to building it yet.
Hi @kschoche,
I think the envoy admin port listens on localhost by default(also prefer to keep it that way). So don't think the above readinessProbe would work.
Something like holdApplicationUntilProxyStarts
would be really helpful.
For us some of our apps are very high in traffic and received 100s of requests some times when the main app container is started and healthy but envoy is yet to start, causing several connection refused errors. Its difficult to move to production with this.
A solution to this problem would be really helpful.
We implemented "network check" in our "Dynamic Entrypoint"
We have configmap with bash-script.
We mount this bash-script as /app/runk8s
in every pod.
In the container definition, we override entrypoint/command to /app/runk8s
When pod starts, /app/runk8s
being executed, that do network check (and other cool things with envconsul)
The bash-script:
#!/bin/bash
retryCnt=15; waitTime=3; while true; do curl -s -f -o /dev/null $CONSUL_ENDPOINT:$CONSUL_PORT/v1/status/leader; if [ $? = 0 ] || [ $retryCnt -lt 1 ]; then break; fi; ((retryCnt--)); sleep $waitTime; done;
if [ "$CHECK_FLAGSHIP" = "true" ]; then curl http://$CONSUL_ENDPOINT:$CONSUL_PORT/v1/kv/configuration/flagship/$DIM_DC -s -f -o /dev/null || { curl -X POST -s -o /dev/null "localhost:19000/quitquitquit"; exit 0; }; fi;
echo "[DEBUG] Entrypoint: Starting 'runk8s' script version 1.3.2"
echo "[DEBUG] Entrypoint: Starting arch detection for envconsul binary"
unamestr=`uname`
if [[ "$unamestr" == 'Linux' ]]; then
envconsul='envconsul'
elif [[ "$unamestr" == 'Darwin' ]]; then
envconsul='envconsul-mac'
fi
echo "[DEBUG] Entrypoint: Starting configure-envconsul.js"
ROOT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
#echo "$ROOT_DIR"
node $ROOT_DIR/configure-envconsul.js || {
echo "[ERROR] Entrypoint: Failed configure-envconsul.js";
curl -X PUT -s -o /dev/null "http://127.0.0.1:8500/v1/agent/leave";
curl -X POST -s -o /dev/null "http://127.0.0.1:19000/quitquitquit";
exit 1;
}
echo "[DEBUG] Entrypoint: Starting envconsul"
trap 'kill -TERM $PID' TERM INT
$ROOT_DIR/$envconsul -once -config "$PWD/envconsul-config.json" "$@" &
PID=$!
wait $PID
trap - TERM INT
wait $PID
EXIT_STATUS=$?
echo "[DEBUG] Entrypoint: Sending agent leave to consul-env sidecar"
curl -X PUT -s -o /dev/null http://127.0.0.1:8500/v1/agent/leave
echo "[DEBUG] Entrypoint: Sending quitquitquit to envoy-proxy sidecar"
curl -X POST -s -o /dev/null http://127.0.0.1:19000/quitquitquit
echo "[DEBUG] Entrypoint: going to exit with exit code ${EXIT_STATUS}"
exit $EXIT_STATUS
CONSUL_ENDPOINT = hostIP (node where pod is running)
Hi @alt-dima -
I just took a look at this again to see if there was an easier way to utilize the Envoy admin API to set a readinessProbe for the service directly, unfortunately its a bit complicated to setup while preserving any readinessProbes that were set by the user since Kubernetes only supports a single readinessProbe per container. However, we do intend to support this in an upcoming release by setting our own readinessProbe to point to the active listener port of Envoy and I believe that should cover this use-case.
Stay tuned! ~Kyle
Hi @kschoche , thx for the updated! This is a much needed feature. Is it included in the release any time soon?
There is a workaround implemented in this now closed PR here: https://github.com/hashicorp/consul-k8s/pull/1482/files. We still are looking to support the application startup scenario for proxy lifecycle for Consul K8s in the future.
If I've got this right, this is now supported in consul-dataplane!239, but consul-k8s is still missing the functionality as StartupGracePeriodSeconds
is set to 0 by default and thus is never executed. I've opened consul-k8s!3878 which should allow to change that value to something else and enable the functionality.
I'll go ahead and close this issue now that https://github.com/hashicorp/consul-k8s/pull/3878 is merged. Thank you @ilpianista . We will likely release this functionality in late May along with our scheduled patch releases.