flux2
flux2 copied to clipboard
Flux Bootstrap Failing on Private EKS Cluster
Describe the bug
Flux bootstrap fails to reconcile and every controller is stuck in a Termination loop. Flux logs, flux events, kubectl logs
Kubectl describe pods -n flux-system:
ubuntu@ip-10-1-159-28:~$ kubectl describe pods -n flux-system
Name: helm-controller-7c8b698656-k4f4f
Namespace: flux-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: helm-controller
Node: ip-10-1-147-215.us-east-2.compute.internal/10.1.147.215
Start Time: Mon, 09 Oct 2023 09:36:57 +0000
Labels: app=helm-controller
pod-template-hash=7c8b698656
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.1.159.4
IPs:
IP: 10.1.159.4
Controlled By: ReplicaSet/helm-controller-7c8b698656
Containers:
manager:
Container ID: containerd://b6de55f6dbf193e86bfc502062651766e71e050940eb54c5f4d7e3f6632eba5a
Image: ghcr.io/fluxcd/helm-controller:v0.36.1
Image ID: ghcr.io/fluxcd/helm-controller@sha256:0378fd84ed0ef430414e0ac5bd79cdc03899ba787c22561e474650498d231ca6
Ports: 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--events-addr=http://notification-controller.flux-system.svc.cluster.local./
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
State: Running
Started: Mon, 09 Oct 2023 09:37:28 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 09 Oct 2023 09:36:59 +0000
Finished: Mon, 09 Oct 2023 09:37:27 +0000
Ready: False
Restart Count: 1
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
Mounts:
/tmp from temp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jsv7n (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
temp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jsv7n:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned flux-system/helm-controller-7c8b698656-k4f4f to ip-10-1-147-215.us-east-2.compute.internal
Normal Pulled 20s (x2 over 49s) kubelet Container image "ghcr.io/fluxcd/helm-controller:v0.36.1" already present on machine
Normal Created 20s (x2 over 48s) kubelet Created container manager
Normal Killing 20s kubelet Container manager failed liveness probe, will be restarted
Normal Started 19s (x2 over 48s) kubelet Started container manager
Warning Unhealthy 10s (x10 over 48s) kubelet Readiness probe failed: Get "http://10.1.159.4:9440/readyz": dial tcp 10.1.159.4:9440: connect: connection refused
Warning Unhealthy 10s (x4 over 40s) kubelet Liveness probe failed: Get "http://10.1.159.4:9440/healthz": dial tcp 10.1.159.4:9440: connect: connection refused
Name: kustomize-controller-858996fc8d-ls654
Namespace: flux-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: kustomize-controller
Node: ip-10-1-147-215.us-east-2.compute.internal/10.1.147.215
Start Time: Mon, 09 Oct 2023 09:36:57 +0000
Labels: app=kustomize-controller
pod-template-hash=858996fc8d
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.1.147.195
IPs:
IP: 10.1.147.195
Controlled By: ReplicaSet/kustomize-controller-858996fc8d
Containers:
manager:
Container ID: containerd://42385ea048ac8ee9368cdd0bf340269cd561cc4d0b274fc866cdb951d6152131
Image: ghcr.io/fluxcd/kustomize-controller:v1.1.0
Image ID: ghcr.io/fluxcd/kustomize-controller@sha256:1f7380a0c7871a7149ca67fb1ba20865566f8f381d14cec2e5ac6af40d96ca55
Ports: 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--events-addr=http://notification-controller.flux-system.svc.cluster.local./
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
State: Running
Started: Mon, 09 Oct 2023 09:36:59 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
Mounts:
/tmp from temp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qdzkz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
temp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-qdzkz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned flux-system/kustomize-controller-858996fc8d-ls654 to ip-10-1-147-215.us-east-2.compute.internal
Normal Pulled 49s kubelet Container image "ghcr.io/fluxcd/kustomize-controller:v1.1.0" already present on machine
Normal Created 49s kubelet Created container manager
Normal Started 48s kubelet Started container manager
Warning Unhealthy 20s (x7 over 48s) kubelet Readiness probe failed: Get "http://10.1.147.195:9440/readyz": dial tcp 10.1.147.195:9440: connect: connection refused
Warning Unhealthy 20s (x3 over 40s) kubelet Liveness probe failed: Get "http://10.1.147.195:9440/healthz": dial tcp 10.1.147.195:9440: connect: connection refused
Normal Killing 20s kubelet Container manager failed liveness probe, will be restarted
Warning Unhealthy 9s kubelet Readiness probe failed: Get "http://10.1.147.195:9440/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Name: notification-controller-ddf44665d-sbl96
Namespace: flux-system
Priority: 0
Service Account: notification-controller
Node: ip-10-1-147-215.us-east-2.compute.internal/10.1.147.215
Start Time: Mon, 09 Oct 2023 09:36:57 +0000
Labels: app=notification-controller
pod-template-hash=ddf44665d
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.1.144.245
IPs:
IP: 10.1.144.245
Controlled By: ReplicaSet/notification-controller-ddf44665d
Containers:
manager:
Container ID: containerd://e3d519db6b0c9f1f64cb4e843c626a82bb28363943ccb5a69a32a634b08daa2f
Image: ghcr.io/fluxcd/notification-controller:v1.1.0
Image ID: ghcr.io/fluxcd/notification-controller@sha256:21bd40a9856d0faba9d769b7bc9b6153edf26a8e6ef2d1b7c3730c35e1942213
Ports: 9090/TCP, 9292/TCP, 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
State: Running
Started: Mon, 09 Oct 2023 09:37:28 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 09 Oct 2023 09:36:59 +0000
Finished: Mon, 09 Oct 2023 09:37:27 +0000
Ready: False
Restart Count: 1
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
Mounts:
/tmp from temp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qb5r2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
temp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-qb5r2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned flux-system/notification-controller-ddf44665d-sbl96 to ip-10-1-147-215.us-east-2.compute.internal
Normal Pulled 20s (x2 over 49s) kubelet Container image "ghcr.io/fluxcd/notification-controller:v1.1.0" already present on machine
Normal Created 20s (x2 over 49s) kubelet Created container manager
Normal Killing 20s kubelet Container manager failed liveness probe, will be restarted
Normal Started 19s (x2 over 48s) kubelet Started container manager
Warning Unhealthy 10s (x9 over 48s) kubelet Readiness probe failed: Get "http://10.1.144.245:9440/readyz": dial tcp 10.1.144.245:9440: connect: connection refused
Warning Unhealthy 10s (x4 over 40s) kubelet Liveness probe failed: Get "http://10.1.144.245:9440/healthz": dial tcp 10.1.144.245:9440: connect: connection refused
Name: source-controller-594c848975-5wdst
Namespace: flux-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: source-controller
Node: ip-10-1-147-215.us-east-2.compute.internal/10.1.147.215
Start Time: Mon, 09 Oct 2023 09:36:57 +0000
Labels: app=source-controller
pod-template-hash=594c848975
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.1.144.144
IPs:
IP: 10.1.144.144
Controlled By: ReplicaSet/source-controller-594c848975
Containers:
manager:
Container ID: containerd://37cc169a6cf4eb355565ee36436d1bcaeb54d6bdbc7044861db7f73bcdc2248b
Image: ghcr.io/fluxcd/source-controller:v1.1.1
Image ID: ghcr.io/fluxcd/source-controller@sha256:a9b4ffe2c145efd9cb71c3d41824eda17dc41dc9e9e8bc3b51bfc86b2243c6a4
Ports: 9090/TCP, 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
SeccompProfile: RuntimeDefault
Args:
--events-addr=http://notification-controller.flux-system.svc.cluster.local./
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
--storage-path=/data
--storage-adv-addr=source-controller.$(RUNTIME_NAMESPACE).svc.cluster.local.
State: Running
Started: Mon, 09 Oct 2023 09:37:28 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Mon, 09 Oct 2023 09:36:59 +0000
Finished: Mon, 09 Oct 2023 09:37:27 +0000
Ready: False
Restart Count: 1
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-btqw5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-btqw5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned flux-system/source-controller-594c848975-5wdst to ip-10-1-147-215.us-east-2.compute.internal
Normal Pulled 20s (x2 over 49s) kubelet Container image "ghcr.io/fluxcd/source-controller:v1.1.1" already present on machine
Normal Created 20s (x2 over 49s) kubelet Created container manager
Normal Killing 20s kubelet Container manager failed liveness probe, will be restarted
Normal Started 19s (x2 over 48s) kubelet Started container manager
Warning Unhealthy 10s (x10 over 48s) kubelet Readiness probe failed: Get "http://10.1.144.144:9090/": dial tcp 10.1.144.144:9090: connect: connection refused
Warning Unhealthy 10s (x4 over 40s) kubelet Liveness probe failed: Get "http://10.1.144.144:9440/healthz": dial tcp 10.1.144.144:9440: connect: connection refused
Steps to reproduce
Bootstrap flux on a private eks cluster
Expected behavior
Flux bootstrap succeeds
Screenshots and recordings
No response
OS / Distro
AWS EKS t3.medium Amazon Linux 2 amd64
Flux version
v2.1.1
Flux check
ubuntu@ip-10-1-159-28:~$ flux check
► checking prerequisites
✔ Kubernetes 1.27.4-eks-2d98532 >=1.25.0-0
► checking controllers
Git provider
Bitbucket
Container Registry provider
ecr
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct