consul-k8s icon indicating copy to clipboard operation
consul-k8s copied to clipboard

Upgrade for AKS Cluster: Can't drain because Too Many Requests

Open DaleyKD opened this issue 2 years ago • 1 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

Currently trying to upgrade my AKS cluster from 1.23.8 to 1.24.0. Two of my nodes aren't able to drain. Please note: the other nodes with pods who have PDBs were able to upgrade successfully. Could it be that the Max Unavailable should rarely? be less than 1 and default to that?

#1278 says that the spec.maxUnavailable is (n/2) - 1, and I've seen somewhere else in these issues that the main recommended number of replicas for consul-connect-injector is 2, which means spec.maxUnavailable will be 0.

I am almost completely new to K8S and especially PDBs, so I don't know what I'm talking about.

kyle@Azure:~$ kubectl get events --sort-by='{.lastTimestamp}'
2m50s       Warning   Drain                     node/aks-default-18345084-vmss000000   Eviction blocked by Too many Requests (usually a pdb): [consul-connect-injector-8df4c6-d2flg]

Reproduction Steps

Used Terraform to install this:

resource "helm_release" "consul" {
  name       = "consul"
  repository = "https://helm.releases.hashicorp.com"
  chart      = "consul"
  version    = "0.46.1"
  namespace  = kubernetes_namespace.consul.metadata.0.name
  values = [
    data.hcs_agent_helm_config.hcs.config # Get the consul config from our HCS cluster
  ]

  set {
    name  = "controller.enabled"
    value = "true"
  }

  set {
    name  = "connectInject.transparentProxy.defaultEnabled"
    value = "false"
  }
}

Expected behavior

Ideally, a problem-free AKS cluster upgrade.

Environment details

  • consul-k8s version: 0.46.1/1.12.3
  • HCS consul version: 1.11.6
  • Kubernetes version: v1.23.8 --> 1.24.0
  • Cloud Provider: AKS
  • Networking CNI plugin: kubenet
kyle@Azure:~$ helm get values consul -n consulns
USER-SUPPLIED VALUES:
client:
  enabled: true
  exposeGossipPorts: true
  join:
  - {uuid}.private.consul.{uuid}.az.hashicorp.cloud
connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: false
controller:
  enabled: true
externalServers:
  enabled: true
  hosts:
  - {uuid}.private.consul.{uuid}.az.hashicorp.cloud
  httpsPort: 443
  k8sAuthMethodHost: https://my-cluster-bdccbc13.hcp.centralus.azmk8s.io:443
  useSystemRoots: true
global:
  acls:
    bootstrapToken:
      secretKey: token
      secretName: my-hcs-cluster-bootstrap-token
    manageSystemACLs: true
  datacenter: dc1
  enabled: false
  gossipEncryption:
    secretKey: gossipEncryptionKey
    secretName: my-hcs-cluster-hcs
  name: consul
  tls:
    caCert:
      secretKey: caCert
      secretName: my-hcs-cluster-hcs
    enableAutoEncrypt: true
    enabled: true

Additional Context

kyle@Azure:~$ kubectl get pdb -A
NAMESPACE     NAME                                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
consulns      consul-connect-injector                 N/A             0                 0                     19h
ingress       nginxingress-ingress-nginx-controller   1               N/A               1                     19h
kube-system   coredns-pdb                             1               N/A               1                     19h
kube-system   konnectivity-agent                      1               N/A               1                     19h
kube-system   metrics-server-pdb                      1               N/A               1                     19h
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consulns -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    meta.helm.sh/release-name: consul
    meta.helm.sh/release-namespace: consulns
  creationTimestamp: "2022-08-25T20:29:53Z"
  generation: 1
  labels:
    app: consul
    app.kubernetes.io/managed-by: Helm
    chart: consul-helm
    component: connect-injector
    heritage: Helm
    release: consul
  name: consul-connect-injector
  namespace: consulns
  resourceVersion: "509006"
  uid: 617192e1-9e3c-498e-9987-bea59a05e11b
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: consul
      component: connect-injector
      release: consul
status:
  conditions:
  - lastTransitionTime: "2022-08-26T15:32:54Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 2
  desiredHealthy: 2
  disruptionsAllowed: 0
  expectedPods: 2
  observedGeneration: 1

DaleyKD avatar Aug 26 '22 15:08 DaleyKD

I'm having this problem again/still. I even tried setting connectInject.disruptionBudget.maxUnavailable to 1, but that doesn't appear to be set in the PDB spec.

I have to delete the PDB to upgrade my AKS cluster.

DaleyKD avatar Sep 06 '22 18:09 DaleyKD

Hi @DaleyKD this is likely addressed by https://github.com/hashicorp/consul-k8s/pull/1530/files. Will close this issue, and we should have that addressed in 0.49.0 which should happen sometime later this week or early next week.

david-yu avatar Sep 27 '22 05:09 david-yu

@david-yu ,

Would you consider reopening this?

I'm currently trying to upgrade AKS from 1.25.6 to 1.26.6.

Before upgrading, I upgraded consul-k8s all the way from 0.49.0 to 0.49.8, then to 1.0.10, then to 1.2.3. I am currently running 1.2.3 which is Consul 1.16.3.

It seems that nothing with the disruptionBudget changed for connect inject.

kyle@Azure:~$ helm get values consul -n consul
USER-SUPPLIED VALUES:
connectInject:
  transparentProxy:
    defaultEnabled: false
dns:
  enabled: false
global:
  acls:
    manageSystemACLs: true
  datacenter: stratusdevdc1
  gossipEncryption:
    autoGenerate: true
  name: consul
  tls:
    enableAutoEncrypt: true
    enabled: true
server:
  disruptionBudget:
    enabled: false
  replicas: 1
kyle@Azure:~$ kubectl get pdb -A
NAMESPACE     NAME                                    MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
consul        consul-connect-injector                 N/A             0                 0                     13h
ingress       nginxingress-ingress-nginx-controller   1               N/A               1                     420d
kube-system   coredns-pdb                             1               N/A               1                     420d
kube-system   konnectivity-agent                      1               N/A               1                     420d
kube-system   metrics-server-pdb                      1               N/A               1                     420d
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consul -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  annotations:
    meta.helm.sh/release-name: consul
    meta.helm.sh/release-namespace: consul
  creationTimestamp: "2023-11-09T01:32:04Z"
  generation: 1
  labels:
    app: consul
    app.kubernetes.io/managed-by: Helm
    chart: consul-helm
    component: connect-injector
    heritage: Helm
    release: consul
  name: consul-connect-injector
  namespace: consul
  resourceVersion: "257320864"
  uid: 81f1e341-e87c-44f9-9faa-49375b0299e9
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      app: consul
      component: connect-injector
      release: consul
status:
  conditions:
  - lastTransitionTime: "2023-11-09T02:05:35Z"
    message: ""
    observedGeneration: 1
    reason: InsufficientPods
    status: "False"
    type: DisruptionAllowed
  currentHealthy: 1
  desiredHealthy: 1
  disruptionsAllowed: 0
  expectedPods: 1
  observedGeneration: 1

I have a hard time believe that, if I'm doing it correctly, I'm the only one who can't ever upgrade AKS. I suspect I'm missing something obvious.

DaleyKD avatar Nov 09 '23 15:11 DaleyKD