cloud-sql-proxy
cloud-sql-proxy copied to clipboard
Wait for network before starting up
Question
We seem to be having an issue with the SQL proxy failing to initialize on the first try due to a timing issue with it and the istio sidecar deployed by ASM. Is there a way to delay the SQL proxy from trying to make a connection to cloud SQL so that the istio sidecar can establish itself? Our backup plan is to disable the port for Istio egress for cloud SQL.
When you say failing to initialize, do you mean the initial call to collect instance metadata fails, or are you talking about the first connection attempt failing?
Assuming you're seeing the initial call to collect instance metadata failing, I don't see a great way to delay initialization. I can think of a few things to try though (in order of preference):
- If you start the proxy with
-fuseusing a Unix socket (e.g.,cloud_sql_proxy -fuse -dir /tmp/cloudsql), the proxy won't do any initial setup until an application attempts to connect to it. - You could try compiling a custom version of the proxy with a sleep just before this line: https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/22080999a649eb110513523b889ded2596313970/cmd/cloud_sql_proxy/cloud_sql_proxy.go#L526 Granted that's a hack, but if that works for you, we could discuss a feature to wait on some external event before proceeding with initialization.
"When you say failing to initialize, do you mean the initial call to collect instance metadata fails, or are you talking about the first connection attempt failing?" I'm not sure about the specific details. Whatever the initial egress call from the proxy container is is failing. At this point the container is toast and has to re-initialize. We were considering #2 as another option, but would prefer not to have to manage our own container. Let me try your other suggestion. Thanks.
Just a note, since these are both google authorized features, I assume this will come up again with other clients. Maybe a discussion with the ASM team to come up with a solid approach to handle this condition is in order?
Yes. In any case, I suspect there's a general need to sometimes delay the proxy at startup.
Sounds good. Thanks again.
Here's additional details:
*url.Error Get "http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/scopes": dial tcp 169.254.169.254:80: connect: connection refused | Get "http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/scopes": dial tcp 169.254.169.254:80: connect: connection refused
Trying to get SA details for the account used by the proxy. Obviously, adding a port override for istio on 80 would not be a good idea! :)
So just to clarify the issue, you're deploying the proxy as a sidecar along with an Istio sidecar. Before the Istio sidecar starts up, the proxy will fail to run because it can't send network calls. The proxy repeatedly crashes, the network comes up, and then the proxy comes up successfully. Is that what you're seeing?
Exactly. We were able to get around the issue with IP filtering on the istio proxy, disabling the metadata service call and the subsequent call to the cloud SQL API. The problem is the SQL API does not appear to have a static IP, so there is no way to provide istio with a filtering range. Essentially, these 2 sidecars do not play well together. I'm not sure if there is a way to check for a mesh proxy and ping it's readiness probe before starting any activity with the SQL proxy? It might make more sense than adding a delay. BTW, I tried using the fuse flag to see if that would squash the initial SQL API call and it hosed the container. I couldn't even get to the logs to see what was going on.

It's really mostly an annoyance cause the proxy recovers fairly quickly. You can see the restarts here.
OK. That helps. I agree -- there's probably a way to check for network connectivity before continuing initialization.
Looks like this is common problem: https://github.com/istio/istio/issues/11130.
Possible fix for Istio users: https://github.com/istio/istio/issues/11130#issuecomment-707642638.
In short (available in Istio 1.7 and later):
annotations:
proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
I guess the question now is, what version do we have? :) That looks promising though.
This does feel like an issue with Istio, but if we have to, we can find a way to work around it. Would you mind reporting back if that annotation solves the issue for you?
Most definently. I appreciate your assistance.
@bmears-aaxis any updates?
Related to #348.
Going to close this given it's a Kubernetes problem and hasn't got much attention.