cloud-sql-proxy
cloud-sql-proxy copied to clipboard
Add command for use in "PostStart" hook that delays until proxy has started
It is a well known issue there is no way to sequence container startups in Kubernetes (e.g. https://github.com/kubernetes/kubernetes/issues/65502). As far as I know, this is the best solution we currently have: https://github.com/istio/istio/issues/11130#issuecomment-645424556
This would be made a lot better if the images provided some wait-until-proxy-is-up-and-running.sh script (especially since the default image lacks sleep, curl, etc.) so that users can easily force Kuberentes to wait until the proxy is up before it starts the next container. Could be as simple as the proxy creating a file when it finishes starting up.
I'm not sure what value a script in the proxy image would have, as you'd probably want something in your application image.
Have you considered using the HTTP health-checks? I think you could use something like this as a wrapper in your application container:
until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:8090/readiness); do
printf '.'
sleep .1 # 100 ms
done
# start your app here
I'm not sure what value a script in the proxy image would have
For the proposed solution, this has to run from the proxy's container. The idea is to use a lifecycle hook to delay startup of the application container until the proxy is ready (see https://github.com/istio/istio/issues/11130#issuecomment-645424556 for more details, this is exactly what Istio does so that the application container doesn't start until the service mesh sidecar is ready).
One could sidestep all of this and run a wait from the application container like you suggest, but I'd rather not couple my application to an implementation detail of one of it's execution environments (in general but also in my particular case, this is being run cross cloud, so it's not even communicating with CloudSQL Proxy half of the time).
So basically what I'm asking for is a way to wait until the proxy is healthy from within the proxy container, ideally without curl so that one can use the distroless image.
Ok, I think I understand now. I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.
It looks like this could probably be an HTTP endpoint, so maybe something like /waituntilready that doesn't return until the proxy is "ready"? There might be a bit of a race condition if the health check server isn't mounted when the HTTP request is sent. Otherwise it would need to be a different execution flow that just calls on the readiness endpoint and terminates when it's green.
I agree that it's an implementation detail. But it has to be dealt with somewhere. Since the thing introducing the implementation detail is the CloudSQL Proxy and the sidecar pattern (we wouldn't have to wait for something to start up if we were connecting to the DB directly) to me it seems most natural to handle that implementation detail via the proxy as well instead of the application. But that's just my opinion.
Re-reading this I realized that by:
I'm not really sure how good of an idea it is depend on the synchronous order of containers though - feels a bit like an implementation detail.
You were talking about a different implementation detail than in my response.
I agree with you, that's an implementation detail of Kubernetes. But seeing that Istio uses it for exactly the same purpose, and that there is currently no better solution (at least that I know of) it seems like it's safe to rely on that until a more official pattern comes about.
We're currently working to get v2 shipped. Once that's done, we'll revisit this and adjust the priority as needed.
Another option would be to use https://github.com/karlkfi/kubexit to support dependency ordering.
For the record, buster image does contain sleep and I ended up using sleep 5. All other options seem overly complex.
We've recently taken a cue from Istio and added a /quitquitquit endpoint (see https://github.com/GoogleCloudPlatform/cloud-sql-proxy/pull/1624).
If we wanted to avoid a solution that required curl, I guess we'd add a wait command that tried to connect to http://localhost:15021/healthz/ready or similar matching Istio again.
This would be really helpful for running tools that are CloudSQL-proxy unaware (for example, mysqldump) in Kubernetes with a CloudSQL proxy sidecar. In this case, I want to block mysqldump (or whatever other tool) from running until the proxy has gotten far enough along in its startup sequence that an attempt to use it won't fail due to lack of initialization.
For the case of mysqldump specifically, the official MySQL image has both bash and curl, so a curl-based solution would be OK. (FWIW, the official postgres image lacks curl, though.)
Is there even an existing HTTP endpoint exposed by the CloudSQL proxy today that would be usable for this purpose? The README section on the Localhost Admin Server doesn't mention one.
Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.
Do you think our HTTP healthchecks would fit the bill? We have startup, liveness, and readiness. In particular the startup probe might be the right one to try.
Ah, I had missed the docs for them! Yes, it does seem like these would probably work (at least if curl or similar is available in the application image).
if curl or similar is available in the application image This wouldn't be ideal; unless the proxy already needs this it'd be adding to the attack surface of the image & if a container was compromised could give an attacker a lot more capability than currently.
Istio images contain a pilot-agent binary which has specific commands including a {{wait}} command – internally this does just poll the readiness endpoint of the proxy (source) up until a (configurable) timeout, and on timeout it calls
This seems to work quite nicely…
That's a nice idea and pretty cheap to implement as well.
:wave: as a heavy cloudsql-proxy user (100's of proxied workloads in a high churn environment) i thought i'd share my piece.
We have a lifecycle.postStart hook which runs a wait-for-cloudsql-running.sh script which we bake into our cloudsql-proxy image. It's super crude, and just has a simple loop waiting for the port to be listening. It works, but it saddens me how our kubernetes estate is littered with these sorts of scripts to try and crudely implement ordering.
We also use istio, which as mentioned has the pilot-agent wait; which is much cleaner in our eyes (and would mean we don't need to bake our own image with scripts in it). So a similar pattern would be nice.
It's worth remembering with kubernetes 1.28 sidecar containers are being introduced which "solves" this problem officially.
Thanks @stono -- even with the support for sidecars coming, I'd guess that a wait command will still be useful in other situations. We have a bunch of stuff higher on the priority list, but we'll try to squeeze this in soon.
For reference, this is a good overview of the post start hook approach: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74
I have a related problem: I'd like to use Docker Compose's health checks to bring up the proxy before starting dependent services, but I can't seem to be able to ping the endpoints from within the container image.
It would be awesome if there was a health check defined in the cloud-sql-proxy Dockerfile itself (see https://docs.docker.com/engine/reference/builder/#healthcheck).
@GergelyKalmar We'd be happy to investigate and support that (assuming no major technical issues). Would you mind opening a separate feature request for that work?
I was able to implement the postSart hook as suggested above by using the alpine image with this script-
sidecar = k8s.V1Container(
name="cloud-sql-proxy",
image="gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.2-alpine",
# ....
lifecycle=k8s.V1Lifecycle(
post_start=k8s.V1LifecycleHandler(
_exec=k8s.V1ExecAction(command=[
"sh",
"-c",
f"""until [ "$(wget --server-response 'http://0.0.0.0:{CLOUD_SQL_HTTP_PORT}/startup' -O - 2>&1 | grep -c 'HTTP/1.1 200 OK')" -eq 1 ]; do
echo "Waiting for Cloud SQL Proxy to be ready";
sleep 10;
done"""
])
)
)
)