consul-k8s Consul Connect: Application is unable to access other services until sidecar container is fully initialized

Question

I am using consul connect auto-injection.

If the sidecar container with envoy is not fully configured yet but the main application already issues a request over the network this request will fail and my application will not start.

I am using this workaround in my entrypoint.sh:

until [ $(curl --fail --silent --output /dev/stderr --write-out "%{http_code}" localhost:19000/ready) -eq 200 ]; do
  echo Waiting for proxy...
  sleep 1
done

With Istio there is the possibility to automatically add lifecycle hooks to the pod which take care of only letting the main container start when the istio sidecar container has sucessfully started. This configuration option is called holdApplicationUntilProxyStarts - see: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/

this can also be achieved using an annotation on the pod:

annotations:
  proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'

I have done some research but I am unable to find if an equivalent of this configuration is also available for consul connect.

Is there something like this available? Any help would be greatly appreciated, THANKS!

Environment details

consul version: 1.12.0

Aug 05 '22 10:08 philslab-ninja

Hi @philslab-ninja - I do not believe that we have any hooks at the webhook level that would allow you to inject probes, but it looks like you could get away with adding a readinessProbe to your application which uses the same url that you mentioned in the issue?

It might look something like this:

readinessProbe:
  httpGet:
    path: /ready
    port: 19000
  failureThreshold: 1
  periodSeconds: 2   # (or some reasonable number)

Would that work for now? We have had some requests in the past to add the ability to inject lifecycle hooks through the injector but have not gotten to building it yet.

Aug 17 '22 16:08 kschoche

Hi @kschoche,

I think the envoy admin port listens on localhost by default(also prefer to keep it that way). So don't think the above readinessProbe would work. Something like holdApplicationUntilProxyStarts would be really helpful. For us some of our apps are very high in traffic and received 100s of requests some times when the main app container is started and healthy but envoy is yet to start, causing several connection refused errors. Its difficult to move to production with this. A solution to this problem would be really helpful.

Sep 06 '22 19:09 narendrapatel

We implemented "network check" in our "Dynamic Entrypoint" We have configmap with bash-script. We mount this bash-script as /app/runk8s in every pod. In the container definition, we override entrypoint/command to /app/runk8s

When pod starts, /app/runk8s being executed, that do network check (and other cool things with envconsul) The bash-script:

#!/bin/bash
retryCnt=15; waitTime=3; while true; do curl -s -f -o /dev/null $CONSUL_ENDPOINT:$CONSUL_PORT/v1/status/leader; if [ $? = 0 ] || [ $retryCnt -lt 1 ]; then break; fi; ((retryCnt--)); sleep $waitTime; done;

if [ "$CHECK_FLAGSHIP" = "true" ]; then curl http://$CONSUL_ENDPOINT:$CONSUL_PORT/v1/kv/configuration/flagship/$DIM_DC -s -f -o /dev/null || { curl -X POST -s -o /dev/null "localhost:19000/quitquitquit"; exit 0; }; fi;

echo "[DEBUG] Entrypoint: Starting 'runk8s' script version 1.3.2"

echo "[DEBUG] Entrypoint: Starting arch detection for envconsul binary"
unamestr=`uname`
if [[ "$unamestr" == 'Linux' ]]; then
   envconsul='envconsul'
elif [[ "$unamestr" == 'Darwin' ]]; then
   envconsul='envconsul-mac'
fi

echo "[DEBUG] Entrypoint: Starting configure-envconsul.js"
ROOT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
#echo "$ROOT_DIR"
node $ROOT_DIR/configure-envconsul.js || {
echo "[ERROR] Entrypoint: Failed configure-envconsul.js";
curl -X PUT -s -o /dev/null "http://127.0.0.1:8500/v1/agent/leave";
curl -X POST -s -o /dev/null "http://127.0.0.1:19000/quitquitquit";
exit 1;
}

echo "[DEBUG] Entrypoint: Starting envconsul"
trap 'kill -TERM $PID' TERM INT
$ROOT_DIR/$envconsul -once -config "$PWD/envconsul-config.json" "$@" &
PID=$!
wait $PID
trap - TERM INT
wait $PID
EXIT_STATUS=$?

echo "[DEBUG] Entrypoint: Sending agent leave to consul-env sidecar"
curl -X PUT -s -o /dev/null http://127.0.0.1:8500/v1/agent/leave

echo "[DEBUG] Entrypoint: Sending quitquitquit to envoy-proxy sidecar"
curl -X POST -s -o /dev/null http://127.0.0.1:19000/quitquitquit

echo "[DEBUG] Entrypoint: going to exit with exit code ${EXIT_STATUS}"
exit $EXIT_STATUS

CONSUL_ENDPOINT = hostIP (node where pod is running)

Sep 27 '22 13:09 alt-dima

Hi @alt-dima -

I just took a look at this again to see if there was an easier way to utilize the Envoy admin API to set a readinessProbe for the service directly, unfortunately its a bit complicated to setup while preserving any readinessProbes that were set by the user since Kubernetes only supports a single readinessProbe per container. However, we do intend to support this in an upcoming release by setting our own readinessProbe to point to the active listener port of Envoy and I believe that should cover this use-case.

Stay tuned! ~Kyle

Sep 28 '22 19:09 kschoche

Hi @kschoche , thx for the updated! This is a much needed feature. Is it included in the release any time soon?

Dec 29 '22 02:12 junjie-landing

There is a workaround implemented in this now closed PR here: https://github.com/hashicorp/consul-k8s/pull/1482/files. We still are looking to support the application startup scenario for proxy lifecycle for Consul K8s in the future.

Jun 29 '23 17:06 david-yu

If I've got this right, this is now supported in consul-dataplane!239, but consul-k8s is still missing the functionality as StartupGracePeriodSeconds is set to 0 by default and thus is never executed. I've opened consul-k8s!3878 which should allow to change that value to something else and enable the functionality.

Apr 04 '24 12:04 ilpianista

I'll go ahead and close this issue now that https://github.com/hashicorp/consul-k8s/pull/3878 is merged. Thank you @ilpianista . We will likely release this functionality in late May along with our scheduled patch releases.

Apr 09 '24 15:04 david-yu

consul-k8s consul-k8s copied to clipboard

Consul Connect: Application is unable to access other services until sidecar container is fully initialized

Question

Environment details

consul-k8s
consul-k8s copied to clipboard