cloud-sql-proxy-operator icon indicating copy to clipboard operation
cloud-sql-proxy-operator copied to clipboard

Randomly failing to update existing deployments, failing to create the csql container

Open pocesar opened this issue 1 year ago • 0 comments

Expected Behavior

It should be able to do rolling updates without breaking and be self sufficient in (re)creating the csql container when the selector matches or it fails to be created

Actual Behavior

Fails to update and never recovers by itself

Steps to Reproduce the Problem

  1. Create deployment with a needs-proxy: "1" label
  2. Create AuthProxyWorkload manifest to match the kind: Deployment and selector.matchLabels."needs-proxy" = "1"
  3. First kubectl apply usually works, updating deployments fail half of the time. This error precede this behavior and never recovers by itself, needing to delete the entire pod:
{
  "textPayload": "2024/09/25 20:04:39 http: TLS handshake error from 192.168.1.3:43058: EOF",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "namespace_name": "cloud-sql-proxy-operator-system",
      "container_name": "manager",
      "pod_name": "cloud-sql-proxy-operator-controller-manager-..."
    }
  },
  "timestamp": "2024-09-25T20:04:39.822373239Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-hash": "6946569c9b",
    "k8s-pod/control-plane": "controller-manager"
  },
  "logName": "projects/.../logs/stderr",
  "receiveTimestamp": "2024-09-25T20:04:42.868056671Z"
}

It then proceeds creating the actual Deployment container, but since it doesn't have the SQL proxy listening on localhost, the new created pod will be in an infinite crash loop since it requires the DB connection.

Specifications

  • Version: 1.5.1
  • Platform: GKE

Side note: it's very hard to read the logs from this operator on GCP, everything is being put on stderr with ERROR severity and the non-structured payloads is very confusing

pocesar avatar Sep 25 '24 20:09 pocesar