flux2
flux2 copied to clipboard
Adding a proxy to the source-controller causes the controller not to start
Describe the bug
I have an EKS cluster that can only access the internet through an HTTP proxy. When I add a HelmRepository, the controller cannot fetch it by default
~ kubectl get helmrepository -n flux-system
NAME URL AGE READY STATUS
aws-eks https://aws.github.io/eks-charts 6h42m False failed to fetch Helm repository index: failed to cache index to temporary file: Get "https://aws.github.io/eks-charts/index.yaml": dial tcp 185.199.108.153:443: i/o timeout
I tried to follow the Bootstrap cheatsheet, which has instructions to patch the controllers to add a proxy
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patches:
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: all
spec:
template:
spec:
containers:
- name: manager
env:
- name: "HTTPS_PROXY"
value: "http://my-proxy-host:9595"
- name: "NO_PROXY"
value: "localhost,127.0.0.1,10.0.0.0/8,.internal,.cluster.local.,.cluster.local,.svc"
target:
kind: Deployment
labelSelector: app.kubernetes.io/part-of=flux
name: "source-controller"
This seems to be getting applied as expected when I bootstrap it:
kubectl describe deploy -n flux-system source-controller
Name: source-controller
Namespace: flux-system
...
Environment:
HTTPS_PROXY: http://my-proxy-host:9595
NO_PROXY: localhost,127.0.0.1,10.0.0.0/8,.internal,.cluster.local.,.cluster.local,.svc
RUNTIME_NAMESPACE: (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
...
However, when I try to bootstrap flux with this patch, the source controller pods don't start anymore, the liveness and readiness probes have connection refused issues
☁ ~ k describe pod -n flux-system source-controller-5c9f7f6d6f-24fs9
Name: source-controller-5c9f7f6d6f-24fs9
Namespace: flux-system
Priority: 0
Service Account: source-controller
Node: ip-10-5-97-151.ec2.internal/10.5.97.151
Start Time: Sat, 08 Oct 2022 06:38:25 -0600
Labels: app=source-controller
pod-template-hash=5c9f7f6d6f
Annotations: container.seccomp.security.alpha.kubernetes.io/manager: runtime/default
kubernetes.io/psp: eks.privileged
prometheus.io/port: 8080
prometheus.io/scrape: true
Status: Running
IP: 10.5.97.146
IPs:
IP: 10.5.97.146
Controlled By: ReplicaSet/source-controller-5c9f7f6d6f
Containers:
manager:
Container ID: containerd://336f9b12c5c05ec9e6e9f787a9f86fb85b0109b5c68c647ad9b551cdcc1ab786
Image: ghcr.io/fluxcd/source-controller:v0.30.0
Image ID: ghcr.io/fluxcd/source-controller@sha256:afd1ceb08de3e9072a3d260604b04a985ff0798031b016519912d6ede28d2533
Ports: 9090/TCP, 8080/TCP, 9440/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
--events-addr=http://notification-controller.flux-system.svc.cluster.local./
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
--storage-path=/data
--storage-adv-addr=source-controller.$(RUNTIME_NAMESPACE).svc.cluster.local.
State: Running
Started: Sat, 08 Oct 2022 06:39:25 -0600
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sat, 08 Oct 2022 06:38:55 -0600
Finished: Sat, 08 Oct 2022 06:39:25 -0600
Ready: False
Restart Count: 2
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HTTPS_PROXY: http://my-proxy-host:9595
NO_PROXY: localhost,127.0.0.1,10.0.0.0/8,.internal,.cluster.local.,.cluster.local,.svc
RUNTIME_NAMESPACE: flux-system (v1:metadata.namespace)
TUF_ROOT: /tmp/.sigstore
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2jtx7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-2jtx7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 64s default-scheduler Successfully assigned flux-system/source-controller-5c9f7f6d6f-24fs9 to ip-10-5-97-151.ec2.internal
Normal Started 34s (x2 over 63s) kubelet Started container manager
Normal Pulled 4s (x3 over 64s) kubelet Container image "ghcr.io/fluxcd/source-controller:v0.30.0" already present on machine
Normal Created 4s (x3 over 64s) kubelet Created container manager
Warning Unhealthy 4s (x9 over 62s) kubelet Readiness probe failed: Get "http://10.5.97.146:9090/": dial tcp 10.5.97.146:9090: connect: connection refused
Warning Unhealthy 4s (x6 over 54s) kubelet Liveness probe failed: Get "http://10.5.97.146:9440/healthz": dial tcp 10.5.97.146:9440: connect: connection refused
Normal Killing 4s (x2 over 34s) kubelet Container manager failed liveness probe, will be restarted
The pods also don't output any logs
Steps to reproduce
- Bootstrap Flux with an EKS cluster with no internet access
- Patch the
soucer-controller
per the instructions to patch the controllers to add a proxy - Pods don't start anymore
Expected behavior
Controller pods should start and be able to communicate to the internet through the proxy
Screenshots and recordings
No response
OS / Distro
client=macOS 12.5.1, EKS node groups=bottlerocket
Flux version
flux: v0.35.0
Flux check
► checking prerequisites
✔ Kubernetes 1.21.14-eks-6d3986b >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.25.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.29.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.27.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.30.0
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta1
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta1
✔ receivers.notification.toolkit.fluxcd.io/v1beta1
✔ all checks passed
Git provider
Bitbucket although I'm using regular Git for bootstraping
Container Registry provider
No response
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Configuring the proxy environment variables in lowercase seem to have solved the issue
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patches:
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: all
spec:
template:
spec:
containers:
- name: manager
env:
- name: "http_proxy"
value: "http://my-proxy-host:9595"
- name: "https_proxY"
value: "http://my-proxy-host:9595"
- name: "no_proxy"
value: "localhost,127.0.0.1,10.0.0.0/8,172.20.0.0/16,.cluster.local.,.cluster.local,.svc,.flux-system"
target:
kind: Deployment
labelSelector: app.kubernetes.io/part-of=flux
name: "source-controller"
@romogo17 thanks for reporting this issue.
IIRC most (if not all) of our proxy implementation relies on the upstream Go, which accounts for both uppercase and lowercase variants:
return &Config{
HTTPProxy: getEnvAny("HTTP_PROXY", "http_proxy"),
HTTPSProxy: getEnvAny("HTTPS_PROXY", "https_proxy"),
NoProxy: getEnvAny("NO_PROXY", "no_proxy"),
CGI: os.Getenv("REQUEST_METHOD") != "",
}
https://github.com/golang/net/blob/8021a29435afef042814c3ad3b702ff04b240bc7/http/httpproxy/proxy.go#L91-L93
I noticed that your first patch did not include a HTTP_PROXY
, only your second one, can you please confirm that setting the three env-vars (HTTP_PROXY
, HTTPS_PROXY
and NO_PROXY
) on both upper or lower cases still yield different results?
Hi @pjbgf, yes, I think that was it — my first patch didn't include a HTTP_PROXY
. I just tried it with only the upper case variants (but including both HTTP and HTTPS and that worked)
So:
- With the 3 env vars in upper case (
HTTP_PROXY
,HTTPS_PROXY
andNO_PROXY
): no issues, all good. - With the 3 env vars in lower case (
http_proxy
,http_proxy
andno_proxy
): the flux controllers start, but it seems like Helm still has issues pulling the repos (see https://github.com/helm/helm/issues/10065, but I'm guessing that's outside the scope of flux). - With only 2 env vars (
HTTPS_PROXY
andNO_PROXY
): issues with the controllers starting — not sure why.
I also included the EKS service subnet (172.20.0.0/16
in the snippets above) to the NO_PROXY
, but that should've already been covered by .cluster.local
.
I general, I think adding the HTTP_PROXY
to the Bootstrap cheatsheet would be useful. Happy to submit the PR. Would I need to update anything other than the MD file here?
hello, I had the same issue and what did the trick for me was to add service (and pod) subnet
quick update, adding the dns ip address (10.43.0.1
on rke2) is sufficient
I also struggle with this. I set all proxy variables (http_proxy
, HTTP_PROXY
, https_proxy
, HTTPS_PROXY
, no_proxy
, NO_PROXY
) for the flux deployments, trying your guys solutions but nothing seems to solve the problems I observe. I added cluster.local
and .cluster.local
ot my no proxy vars because my controllers cannot reach each other via FQDN but it does not help. I suspect that the wget
implementation used under the hood is causing this. It does not read the no proxy variables but rather needs the flag -Y off
to omit using a proxy - testing this manually worked. Is there any other ideas how to solve this issue?
I can confirm that I face the same problem. Tried both upper case and lower case env vars, still health checks fail for every flux pod
We have the same problem, env set in upper and in lower case.
Same problem here (env set in upper and in lower case), anybody a fix for this?
as I said in my previous comment, I've added DNS IP in the NO_PROXY
env variable (10.43.0.1
on rke2) and it works now
@sylvainOL, you're right. For people interested, here is my configuration that is currently working.
-
kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patches:
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: all
spec:
template:
spec:
containers:
- name: manager
securityContext:
runAsUser: 65534
seccompProfile:
$patch: delete
env:
- name: "HTTPS_PROXY"
value: "http://proxy.example.com:3128"
- name: "HTTP_PROXY"
value: "http://proxy.example.com:3128"
- name: "NO_PROXY"
# 172.30.0.1 is my DNS IP
value: ".cluster.local.,.cluster.local,.svc,10.24.62.0/24,172.30.0.1,172.30.0.0/24"
target:
kind: Deployment
labelSelector: app.kubernetes.io/part-of=flux
- patch: |-
- op: remove
path: /metadata/labels/pod-security.kubernetes.io~1warn
- op: remove
path: /metadata/labels/pod-security.kubernetes.io~1warn-version
target:
kind: Namespace
labelSelector: app.kubernetes.io/part-of=flux