consul-k8s
consul-k8s copied to clipboard
Upgrade for AKS Cluster: Can't drain because Too Many Requests
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
Currently trying to upgrade my AKS cluster from 1.23.8 to 1.24.0. Two of my nodes aren't able to drain. Please note: the other nodes with pods who have PDBs were able to upgrade successfully. Could it be that the Max Unavailable
should rarely? be less than 1 and default to that?
#1278 says that the spec.maxUnavailable
is (n/2) - 1
, and I've seen somewhere else in these issues that the main recommended number of replicas for consul-connect-injector
is 2, which means spec.maxUnavailable
will be 0.
I am almost completely new to K8S and especially PDBs, so I don't know what I'm talking about.
kyle@Azure:~$ kubectl get events --sort-by='{.lastTimestamp}'
2m50s Warning Drain node/aks-default-18345084-vmss000000 Eviction blocked by Too many Requests (usually a pdb): [consul-connect-injector-8df4c6-d2flg]
Reproduction Steps
Used Terraform to install this:
resource "helm_release" "consul" {
name = "consul"
repository = "https://helm.releases.hashicorp.com"
chart = "consul"
version = "0.46.1"
namespace = kubernetes_namespace.consul.metadata.0.name
values = [
data.hcs_agent_helm_config.hcs.config # Get the consul config from our HCS cluster
]
set {
name = "controller.enabled"
value = "true"
}
set {
name = "connectInject.transparentProxy.defaultEnabled"
value = "false"
}
}
Expected behavior
Ideally, a problem-free AKS cluster upgrade.
Environment details
-
consul-k8s
version: 0.46.1/1.12.3 - HCS consul version: 1.11.6
- Kubernetes version: v1.23.8 --> 1.24.0
- Cloud Provider: AKS
- Networking CNI plugin: kubenet
kyle@Azure:~$ helm get values consul -n consulns
USER-SUPPLIED VALUES:
client:
enabled: true
exposeGossipPorts: true
join:
- {uuid}.private.consul.{uuid}.az.hashicorp.cloud
connectInject:
enabled: true
transparentProxy:
defaultEnabled: false
controller:
enabled: true
externalServers:
enabled: true
hosts:
- {uuid}.private.consul.{uuid}.az.hashicorp.cloud
httpsPort: 443
k8sAuthMethodHost: https://my-cluster-bdccbc13.hcp.centralus.azmk8s.io:443
useSystemRoots: true
global:
acls:
bootstrapToken:
secretKey: token
secretName: my-hcs-cluster-bootstrap-token
manageSystemACLs: true
datacenter: dc1
enabled: false
gossipEncryption:
secretKey: gossipEncryptionKey
secretName: my-hcs-cluster-hcs
name: consul
tls:
caCert:
secretKey: caCert
secretName: my-hcs-cluster-hcs
enableAutoEncrypt: true
enabled: true
Additional Context
kyle@Azure:~$ kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
consulns consul-connect-injector N/A 0 0 19h
ingress nginxingress-ingress-nginx-controller 1 N/A 1 19h
kube-system coredns-pdb 1 N/A 1 19h
kube-system konnectivity-agent 1 N/A 1 19h
kube-system metrics-server-pdb 1 N/A 1 19h
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consulns -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
annotations:
meta.helm.sh/release-name: consul
meta.helm.sh/release-namespace: consulns
creationTimestamp: "2022-08-25T20:29:53Z"
generation: 1
labels:
app: consul
app.kubernetes.io/managed-by: Helm
chart: consul-helm
component: connect-injector
heritage: Helm
release: consul
name: consul-connect-injector
namespace: consulns
resourceVersion: "509006"
uid: 617192e1-9e3c-498e-9987-bea59a05e11b
spec:
maxUnavailable: 0
selector:
matchLabels:
app: consul
component: connect-injector
release: consul
status:
conditions:
- lastTransitionTime: "2022-08-26T15:32:54Z"
message: ""
observedGeneration: 1
reason: InsufficientPods
status: "False"
type: DisruptionAllowed
currentHealthy: 2
desiredHealthy: 2
disruptionsAllowed: 0
expectedPods: 2
observedGeneration: 1
I'm having this problem again/still. I even tried setting connectInject.disruptionBudget.maxUnavailable
to 1
, but that doesn't appear to be set in the PDB spec.
I have to delete the PDB to upgrade my AKS cluster.
Hi @DaleyKD this is likely addressed by https://github.com/hashicorp/consul-k8s/pull/1530/files. Will close this issue, and we should have that addressed in 0.49.0 which should happen sometime later this week or early next week.
@david-yu ,
Would you consider reopening this?
I'm currently trying to upgrade AKS from 1.25.6 to 1.26.6.
Before upgrading, I upgraded consul-k8s
all the way from 0.49.0 to 0.49.8, then to 1.0.10, then to 1.2.3. I am currently running 1.2.3 which is Consul 1.16.3.
It seems that nothing with the disruptionBudget changed for connect inject.
kyle@Azure:~$ helm get values consul -n consul
USER-SUPPLIED VALUES:
connectInject:
transparentProxy:
defaultEnabled: false
dns:
enabled: false
global:
acls:
manageSystemACLs: true
datacenter: stratusdevdc1
gossipEncryption:
autoGenerate: true
name: consul
tls:
enableAutoEncrypt: true
enabled: true
server:
disruptionBudget:
enabled: false
replicas: 1
kyle@Azure:~$ kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
consul consul-connect-injector N/A 0 0 13h
ingress nginxingress-ingress-nginx-controller 1 N/A 1 420d
kube-system coredns-pdb 1 N/A 1 420d
kube-system konnectivity-agent 1 N/A 1 420d
kube-system metrics-server-pdb 1 N/A 1 420d
kyle@Azure:~$ kubectl get pdb/consul-connect-injector -n consul -o yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
annotations:
meta.helm.sh/release-name: consul
meta.helm.sh/release-namespace: consul
creationTimestamp: "2023-11-09T01:32:04Z"
generation: 1
labels:
app: consul
app.kubernetes.io/managed-by: Helm
chart: consul-helm
component: connect-injector
heritage: Helm
release: consul
name: consul-connect-injector
namespace: consul
resourceVersion: "257320864"
uid: 81f1e341-e87c-44f9-9faa-49375b0299e9
spec:
maxUnavailable: 0
selector:
matchLabels:
app: consul
component: connect-injector
release: consul
status:
conditions:
- lastTransitionTime: "2023-11-09T02:05:35Z"
message: ""
observedGeneration: 1
reason: InsufficientPods
status: "False"
type: DisruptionAllowed
currentHealthy: 1
desiredHealthy: 1
disruptionsAllowed: 0
expectedPods: 1
observedGeneration: 1
I have a hard time believe that, if I'm doing it correctly, I'm the only one who can't ever upgrade AKS. I suspect I'm missing something obvious.