aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

Helm update to version 1.7.x resulting in pods Readiness probe failed: HTTP probe failed with statuscode: 404

Open myevit opened this issue 1 year ago • 8 comments

Helm update to version 1.7.x resulting in Pod: aws-load-balancer-controller Readiness probe failed: HTTP probe failed with statuscode: 404

helm versions 1.4.1 through 1.6.2 working as expected

1.7.x pod logs:

{"level":"info","ts":1708126775.380832,"msg":"version","GitVersion":"v2.4.6","GitCommit":"a92e689dfe464f5b24784f398947e0fef31dc470","BuildDate":"2023-01-12T06:29:16+0000"}
{"level":"info","ts":1708126775.410416,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1708126775.4145405,"logger":"setup","msg":"adding health check for controller"}
{"level":"info","ts":1708126775.414611,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":1708126775.4146783,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding"}
{"level":"info","ts":1708126775.414728,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-elbv2-k8s-aws-v1beta1-targetgroupbinding"}
{"level":"info","ts":1708126775.4147651,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-networking-v1-ingress"}
{"level":"info","ts":1708126775.4148157,"logger":"setup","msg":"starting podInfo repo"}
{"level":"info","ts":1708126777.415071,"msg":"starting metrics server","path":"/metrics"}
I0216 23:39:37.415030       1 leaderelection.go:243] attempting to acquire leader lease kube-system/aws-load-balancer-controller-leader...
{"level":"info","ts":1708126777.415177,"logger":"controller-runtime.webhook.webhooks","msg":"starting webhook server"}
{"level":"info","ts":1708126777.4158258,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1708126777.415922,"logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":9443}
{"level":"info","ts":1708126777.4159951,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}

myevit avatar Feb 16 '24 23:02 myevit

+1 started seeing this today

greg-anetac avatar Feb 17 '24 21:02 greg-anetac

This is known issue. Can you confirm that you are using latest version for controller also. If you are updating chart version, you need also to use v2.7.0 controller version

wweiwei-li avatar Feb 21 '24 23:02 wweiwei-li

Helm chart reporting App Version v 2.7.1 Also, in the helm chart 1.7.1 it set to public.ecr.aws/eks/aws-load-balancer-controller:v2.4.6 whatever version is in that container the one is running in chart 1.7.1...

image:
  pullPolicy: IfNotPresent
  repository: public.ecr.aws/eks/aws-load-balancer-controller
  tag: v2.4.6

myevit avatar Feb 22 '24 22:02 myevit

https://github.com/aws/eks-charts/blob/master/stable/aws-load-balancer-controller/values.yaml. Here is the most recent helm chart, which links to the right controller version. Can you tell us from where did you see the v2.4.6.

wweiwei-li avatar Feb 28 '24 23:02 wweiwei-li

https://github.com/aws/eks-charts/blob/master/stable/aws-load-balancer-controller/values.yaml. Here is the most recent helm chart, which links to the right controller version. Can you tell us from where did you see the v2.4.6.

Manual image tagging in the published helm chart resolving the issue.

I am using Lens to update helm chart deployments. For some reason Lens is thinking that Image tag is user supplied value and keeping it unchanged.

additionalLabels: {}
affinity: {}
autoscaling:
  enabled: false
  maxReplicas: 5
  minReplicas: 1
  targetCPUUtilizationPercentage: 80
awsApiEndpoints: null
awsApiThrottle: null
awsMaxRetries: null
backendSecurityGroup: null
cluster:
  dnsDomain: cluster.local
clusterName: production
clusterSecretsPermissions:
  allowAllSecrets: false
configureDefaultAffinity: true
controllerConfig:
  featureGates: {}
createIngressClassResource: true
defaultSSLPolicy: null
defaultTags: {}
defaultTargetType: instance
deploymentAnnotations: {}
disableIngressClassAnnotation: null
disableIngressGroupNameAnnotation: null
disableRestrictedSecurityGroupRules: null
dnsPolicy: null
enableBackendSecurityGroup: null
enableCertManager: false
enableEndpointSlices: null
enablePodReadinessGateInject: null
enableServiceMutatorWebhook: true
enableShield: null
enableWaf: null
enableWafv2: null
env: null
externalManagedTags: []
extraVolumeMounts: null
extraVolumes: null
fullnameOverride: ""
hostNetwork: false
image:
  pullPolicy: IfNotPresent
  repository: public.ecr.aws/eks/aws-load-balancer-controller
  tag: v2.4.6
imagePullSecrets: []
ingressClass: alb
ingressClassConfig:
  default: false
ingressClassParams:
  create: true
  name: null
  spec: {}
ingressMaxConcurrentReconciles: null
keepTLSSecret: true
livenessProbe:
  failureThreshold: 2
  httpGet:
    path: /healthz
    port: 61779
    scheme: HTTP
  initialDelaySeconds: 30
  timeoutSeconds: 10
logLevel: null
metricsBindAddr: ""
nameOverride: ""
nodeSelector: {}
objectSelector:
  matchExpressions: null
  matchLabels: null
podAnnotations: {}
podDisruptionBudget: {}
podLabels: {}
podSecurityContext:
  fsGroup: 65534
priorityClassName: system-cluster-critical
rbac:
  create: true
readinessProbe:
  failureThreshold: 2
  httpGet:
    path: /readyz
    port: 61779
    scheme: HTTP
  initialDelaySeconds: 10
  successThreshold: 1
  timeoutSeconds: 10
region: ca-central-1
replicaCount: 2
resources: {}
revisionHistoryLimit: 10
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true
serviceAccount:
  annotations: {}
  automountServiceAccountToken: true
  create: false
  imagePullSecrets: null
  name: aws-load-balancer-controller
serviceAnnotations: {}
serviceMaxConcurrentReconciles: null
serviceMonitor:
  additionalLabels: {}
  enabled: false
  interval: 1m
  namespace: null
serviceTargetENISGTags: null
syncPeriod: null
targetgroupbindingMaxConcurrentReconciles: null
targetgroupbindingMaxExponentialBackoffDelay: null
terminationGracePeriodSeconds: 10
tolerations: []
topologySpreadConstraints: {}
updateStrategy: {}
vpcId: ""
watchNamespace: null
webhookBindPort: null
webhookNamespaceSelectors: null
webhookTLS:
  caCert: null
  cert: null
  key: null

myevit avatar Feb 29 '24 03:02 myevit

For this time, I changed the readiness probe health check back to "healthz", and its worked properly.

@davidkl97 Please check whether it is an issue or if any workaround needs to be done here.

ravikyada avatar Feb 29 '24 06:02 ravikyada

Hey, @ravikyada which image version you are using? please note that readiness probe change requires newer images as suggested here due to the the fact that readyz endpoint registered in newer version(image version 2.7.0)

davidkl97 avatar Feb 29 '24 06:02 davidkl97

ok thank you @davidkl97 for the help!!!

ravikyada avatar Feb 29 '24 06:02 ravikyada