cloud-sql-proxy Intermittent "connection aborted - error reading from instance" errors with auth proxy as a sidecar on Cloud Run

Intermittent "connection aborted - error reading from instance" errors with auth proxy as a sidecar on Cloud Run

Open Pascal-Delange opened this issue 5 months ago • 3 comments

trafficstars

Bug Description

I have a Cloud Run service running with a cloud sql auth proxy sidecar to connect to a set of CloudSQL instances (currently, 5 of them). Several instances of the service can coexist at any given time. Sometimes, with increasing frequency (used to be once a month or so, it's getting to several times a week recently), all the connections to CloudSQL in once instance error out with the following error logs

'[project_id:europe-west1:instance_2] connection aborted - error reading from instance: read tcp 169.254.8.1:60699->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_2] IO Error on Read or Write: read tcp 169.254.8.1:60699->{instance_ip}:3307: read: connection reset by peer

It always happens on all connected instances at the same time, for one given instance of the proxy. As far as we have been able to observe, there is no visible correlation between this issue occurring and any sort of high load on the cloud run service, or the databases it connects to.

Example code (or command)

Intermittent error that does not seem related to any particular lines of code (see below for proxy options).

Stacktrace

'[project_id:europe-west1:instance_2] connection aborted - error reading from instance: read tcp 169.254.8.1:60699->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_2] IO Error on Read or Write: read tcp 169.254.8.1:60699->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_4] connection aborted - error reading from instance: read tcp 169.254.8.1:58109->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_4] IO Error on Read or Write: read tcp 169.254.8.1:58109->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_3] connection aborted - error reading from instance: read tcp 169.254.8.1:33878->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_3] IO Error on Read or Write: read tcp 169.254.8.1:33878->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_1] connection aborted - error reading from instance: read tcp 169.254.8.1:44952->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_1] IO Error on Read or Write: read tcp 169.254.8.1:44952->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_4] connection aborted - error reading from instance: read tcp 169.254.8.1:29766->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_3] connection aborted - error reading from instance: read tcp 169.254.8.1:60901->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_1] connection aborted - error reading from instance: read tcp 169.254.8.1:53263->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_4] IO Error on Read or Write: read tcp 169.254.8.1:29766->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_3] IO Error on Read or Write: read tcp 169.254.8.1:60901->{instance_ip}:3307: read: connection reset by peer
'[project_id:europe-west1:instance_1] IO Error on Read or Write: read tcp 169.254.8.1:53263->{instance_ip}:3307: read: connection reset by peer

Steps to reproduce?

I don't really trigger the bug, it just happens sometimes. The frequency seems to be increasing recently.

Environment

OS type and version: Docker container on Cloud Run
The sidecar container so far had 500m vCPU allocated (half a vCPU) - I changed it to 1 full vCPU today, waiting to see if the issue occurs again.
Cloud SQL Proxy version : gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.16.0
Proxy invocation command :

args = [
  "--unix-socket=/cloudsql",
  "--structured-logs",
  "--health-check",
  "--http-address=0.0.0.0",
  "--max-sigterm-delay=10s", // wait 10sec max before closing all connections when the container receives SIGTERM. Should be longer than the condition applied in the client code, if any.
  "--debug-logs",
  "--lazy-refresh",
]

"--lazy-refresh", has been recently added to see if it fixes the issue, to no avail.

Additional Details

No response

May 28 '25 11:05 Pascal-Delange

cloud-sql-proxy cloud-sql-proxy copied to clipboard

Intermittent "connection aborted - error reading from instance" errors with auth proxy as a sidecar on Cloud Run

Bug Description

Example code (or command)

Stacktrace

Steps to reproduce?

Environment

Additional Details

cloud-sql-proxy
cloud-sql-proxy copied to clipboard