go-zero icon indicating copy to clipboard operation
go-zero copied to clipboard

The maintenance of Google GKE caused the k8s resolver of the service to report an error, leading to a 10-minute service disruption.

Open shaolei opened this issue 5 months ago • 1 comments

Describe the bug The maintenance of Google GKE caused the k8s resolver of the service to report an error, leading to a 10-minute service disruption.

To Reproduce Steps to reproduce the behavior, if applicable:

  1. The GKE logs
2025-06-11 08:31:53.072  rgs-01  k8s.io  7a k8s.core.v1.configmaps.delete  iten/configmaps/cluster-autoscaler-status  system:cluster-autoscaler  audit.log  method: "io.k8s.core.v1.configmaps.delete", principal.email: "system:cluster-autoscaler"
2025-06-11 08:31:53.235  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/opa-updater  system:opa-updater  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:opa-updater"
2025-06-11 08:31:53.334  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/vpa-admission-controller  system:vpa-admission-controller  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:vpa-admission-controller"
2025-06-11 08:31:53.393  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/psp-admission-controller  system:addon-manager  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:addon-manager"
2025-06-11 08:31:53.577  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/rgs-01/pool-3-db46f37-wi8t  system:node:rgs-01-pool-3-db46f37-wi8t  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:node:rgs-01-pool-3-db46f37-wi8t"
2025-06-11 08:31:53.695  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/cloud-controller-manager  system:cloud-controller-manager  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:cloud-controller-manager"
2025-06-11 08:31:53.843  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/resource-exportation-controller-m2  system:cloud-controller-manager  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:cloud-controller-manager"
2025-06-11 08:31:53.741  rgs-01  k8s.io  7a k8s.core.v1.configmaps.update  kube-system/configmaps/cluster-kubestore  system:kubestore-collector  audit.log  method: "io.k8s.core.v1.configmaps.update", principal.email: "system:kubestore-collector"
2025-06-11 08:31:53.786  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/cluster-kubestore  system:kubestore-collector  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:kubestore-collector"
2025-06-11 08:31:54.006  rgs-01  k8s.io  7a k8s.coordination.v1.leases.update  namespaces/kube-system/leases/gkebackup-agent-lock  system:gkebackup-agent  audit.log  method: "io.k8s.coordination.v1.leases.update", principal.email: "system:gkebackup-agent"
2025-06-11 08:35:29.884  rgs-01  Updating  redirect service (resourceVersion 885979338)
2025-06-11 08:37:24.086  rgs-01  Updating  redirect service (resourceVersion 885979338)
2025-06-11 08:40:03.085  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
2025-06-11 08:40:03.185  rgs-01  k8s.io  7a k8s.coordination.v1.leases.create  namespaces/rgs-01/psp-admission-controller  system:apiserver  audit.log  method: "io.k8s.coordination.v1.leases.create", principal.email: "system:apiserver"
  1. The error is

    "pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://10.241.0.1:443/api/v1/namespaces/rgs/endpoints?fieldSelector=metadata.name%3Dauthrouter&resourceVersion=824891315": dial tcp 10.241.0.1:443: i/o timeout"
    

Environments (please complete the following information):

  • OS: [e.g. Linux]
  • go-zero version [e.g. 1.7.2]

More description The above error caused the invocation between gRPC services to be interrupted. Can we make some optimizations to maintain service invocations, such as calling the local cache after an error occurs?

shaolei avatar Jun 12 '25 03:06 shaolei

I think you might need to reach out to GKE support engineers.

kevwan avatar Jun 12 '25 15:06 kevwan