cloud-sql-proxy icon indicating copy to clipboard operation
cloud-sql-proxy copied to clipboard

Add command for use in "PostStart" hook that delays until proxy has started

Open adriangb opened this issue 3 years ago • 8 comments

It is a well known issue there is no way to sequence container startups in Kubernetes (e.g. https://github.com/kubernetes/kubernetes/issues/65502). As far as I know, this is the best solution we currently have: https://github.com/istio/istio/issues/11130#issuecomment-645424556

This would be made a lot better if the images provided some wait-until-proxy-is-up-and-running.sh script (especially since the default image lacks sleep, curl, etc.) so that users can easily force Kuberentes to wait until the proxy is up before it starts the next container. Could be as simple as the proxy creating a file when it finishes starting up.

adriangb avatar Feb 26 '22 03:02 adriangb

I'm not sure what value a script in the proxy image would have, as you'd probably want something in your application image.

Have you considered using the HTTP health-checks? I think you could use something like this as a wrapper in your application container:

until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:8090/readiness); do
    printf '.'
    sleep .1 # 100 ms
done

# start your app here

kurtisvg avatar Feb 28 '22 16:02 kurtisvg

I'm not sure what value a script in the proxy image would have

For the proposed solution, this has to run from the proxy's container. The idea is to use a lifecycle hook to delay startup of the application container until the proxy is ready (see https://github.com/istio/istio/issues/11130#issuecomment-645424556 for more details, this is exactly what Istio does so that the application container doesn't start until the service mesh sidecar is ready).

One could sidestep all of this and run a wait from the application container like you suggest, but I'd rather not couple my application to an implementation detail of one of it's execution environments (in general but also in my particular case, this is being run cross cloud, so it's not even communicating with CloudSQL Proxy half of the time).

So basically what I'm asking for is a way to wait until the proxy is healthy from within the proxy container, ideally without curl so that one can use the distroless image.

adriangb avatar Feb 28 '22 17:02 adriangb

Ok, I think I understand now. I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.

It looks like this could probably be an HTTP endpoint, so maybe something like /waituntilready that doesn't return until the proxy is "ready"? There might be a bit of a race condition if the health check server isn't mounted when the HTTP request is sent. Otherwise it would need to be a different execution flow that just calls on the readiness endpoint and terminates when it's green.

kurtisvg avatar Feb 28 '22 19:02 kurtisvg

I agree that it's an implementation detail. But it has to be dealt with somewhere. Since the thing introducing the implementation detail is the CloudSQL Proxy and the sidecar pattern (we wouldn't have to wait for something to start up if we were connecting to the DB directly) to me it seems most natural to handle that implementation detail via the proxy as well instead of the application. But that's just my opinion.

adriangb avatar Feb 28 '22 20:02 adriangb

Re-reading this I realized that by:

I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.

You were talking about a different implementation detail than in my response.

I agree with you, that's an implementation detail of Kubernetes. But seeing that Istio uses it for exactly the same purpose, and that there is currently no better solution (at least that I know of) it seems like it's safe to rely on that until a more official pattern comes about.

adriangb avatar Mar 28 '22 00:03 adriangb

We're currently working to get v2 shipped. Once that's done, we'll revisit this and adjust the priority as needed.

enocom avatar Mar 28 '22 15:03 enocom

Another option would be to use https://github.com/karlkfi/kubexit to support dependency ordering.

enocom avatar Apr 07 '22 16:04 enocom

For the record, buster image does contain sleep and I ended up using sleep 5. All other options seem overly complex.

alamothe avatar Aug 04 '22 15:08 alamothe

We've recently taken a cue from Istio and added a /quitquitquit endpoint (see https://github.com/GoogleCloudPlatform/cloud-sql-proxy/pull/1624).

If we wanted to avoid a solution that required curl, I guess we'd add a wait command that tried to connect to http://localhost:15021/healthz/ready or similar matching Istio again.

enocom avatar Feb 01 '23 03:02 enocom

This would be really helpful for running tools that are CloudSQL-proxy unaware (for example, mysqldump) in Kubernetes with a CloudSQL proxy sidecar. In this case, I want to block mysqldump (or whatever other tool) from running until the proxy has gotten far enough along in its startup sequence that an attempt to use it won't fail due to lack of initialization.

For the case of mysqldump specifically, the official MySQL image has both bash and curl, so a curl-based solution would be OK. (FWIW, the official postgres image lacks curl, though.)

Is there even an existing HTTP endpoint exposed by the CloudSQL proxy today that would be usable for this purpose? The README section on the Localhost Admin Server doesn't mention one.

benweint avatar Jul 20 '23 14:07 benweint

Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.

enocom avatar Jul 20 '23 15:07 enocom

Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.

Ah, I had missed the docs for them! Yes, it does seem like these would probably work (at least if curl or similar is available in the application image).

benweint avatar Jul 20 '23 15:07 benweint

if curl or similar is available in the application image This wouldn't be ideal; unless the proxy already needs this it'd be adding to the attack surface of the image & if a container was compromised could give an attacker a lot more capability than currently.

Istio images contain a pilot-agent binary which has specific commands including a {{wait}} command – internally this does just poll the readiness endpoint of the proxy (source) up until a (configurable) timeout, and on timeout it calls

This seems to work quite nicely…

michaelbannister avatar Jul 25 '23 08:07 michaelbannister

That's a nice idea and pretty cheap to implement as well.

enocom avatar Jul 25 '23 18:07 enocom

:wave: as a heavy cloudsql-proxy user (100's of proxied workloads in a high churn environment) i thought i'd share my piece. We have a lifecycle.postStart hook which runs a wait-for-cloudsql-running.sh script which we bake into our cloudsql-proxy image. It's super crude, and just has a simple loop waiting for the port to be listening. It works, but it saddens me how our kubernetes estate is littered with these sorts of scripts to try and crudely implement ordering.

We also use istio, which as mentioned has the pilot-agent wait; which is much cleaner in our eyes (and would mean we don't need to bake our own image with scripts in it). So a similar pattern would be nice.

It's worth remembering with kubernetes 1.28 sidecar containers are being introduced which "solves" this problem officially.

Stono avatar Oct 24 '23 10:10 Stono

Thanks @stono -- even with the support for sidecars coming, I'd guess that a wait command will still be useful in other situations. We have a bunch of stuff higher on the priority list, but we'll try to squeeze this in soon.

enocom avatar Oct 24 '23 18:10 enocom

For reference, this is a good overview of the post start hook approach: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74

enocom avatar Oct 27 '23 04:10 enocom

I have a related problem: I'd like to use Docker Compose's health checks to bring up the proxy before starting dependent services, but I can't seem to be able to ping the endpoints from within the container image.

It would be awesome if there was a health check defined in the cloud-sql-proxy Dockerfile itself (see https://docs.docker.com/engine/reference/builder/#healthcheck).

GergelyKalmar avatar Oct 27 '23 19:10 GergelyKalmar

@GergelyKalmar We'd be happy to investigate and support that (assuming no major technical issues). Would you mind opening a separate feature request for that work?

enocom avatar Oct 30 '23 16:10 enocom

I was able to implement the postSart hook as suggested above by using the alpine image with this script-

sidecar = k8s.V1Container(
    name="cloud-sql-proxy",
    image="gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.2-alpine",
    # ....
    lifecycle=k8s.V1Lifecycle(
        post_start=k8s.V1LifecycleHandler(
            _exec=k8s.V1ExecAction(command=[
                "sh",
                "-c",
                f"""until [ "$(wget --server-response 'http://0.0.0.0:{CLOUD_SQL_HTTP_PORT}/startup' -O - 2>&1 | grep -c 'HTTP/1.1 200 OK')" -eq 1 ]; do
                  echo "Waiting for Cloud SQL Proxy to be ready";
                  sleep 10;
              done"""
            ])
        )
    )
)

ohaibbq avatar Nov 16 '23 17:11 ohaibbq