cilium-cli
cilium-cli copied to clipboard
DaemonSet Unavailable on AWS deployment of Openshift.
Getting the following issue when trying to install Cilium on an AWS deployment of Openshift V4.6. I am able to install cilium through the operator hub without problem however when I run the install command on the command line "cilium install --cluster-name=x" I consistently run into this error. It seems the cilium-agent containers are not able to be deployed. I cant find any reference to this issue in the docs.
:hourglass: Waiting for Cilium to be installed...
/¯¯\
/¯¯\__/¯¯\ Cilium: 6 errors
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Hubble: disabled
\__/¯¯\__/ ClusterMesh: disabled
\__/
DaemonSet cilium Desired: 5, Unavailable: 5/5
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Running: 5
cilium-operator Running: 1
Image versions cilium quay.io/cilium/cilium:v1.10.2: 5
cilium-operator quay.io/cilium/operator-generic:v1.10.2: 1
Errors: cilium cilium 5 pods of DaemonSet cilium are not ready
cilium cilium-49x5q unable to retrieve cilium status: unable to upgrade connection: container not found ("cilium-agent")
cilium cilium-fnr5t unable to retrieve cilium status: unable to upgrade connection: container not found ("cilium-agent")
cilium cilium-mg2kp unable to retrieve cilium status: unable to upgrade connection: container not found ("cilium-agent")
cilium cilium-s27px unable to retrieve cilium status: unable to upgrade connection: container not found ("cilium-agent")
cilium cilium-zntcq unable to retrieve cilium status: unable to upgrade connection: container not found ("cilium-agent")
:leftwards_arrow_with_hook: Rolling back installation...
Error: Unable to install Cilium: timeout while waiting for status to become successful: context deadline exceeded
@owainow Could you collect a Cilium sysdump for that cluster? It's hard to help otherwise as cilium status
doesn't report in-depth information.
Sure let me attach it. New to Cilium so if I've not included any info etc let me know. cilium-sysdump-20210809-164914.zip
There seem to be an issue with pulling the image for one of the operator pods:
"state": {
"waiting": {
"message": "Back-off pulling image \"quay.io/cilium/operator-generic:v1.10.2@sha256:a88b04cb5895610620da6e90d362af9e512d2baa51a0a0d77ab34186dfb20c68\"",
"reason": "ImagePullBackOff"
}
}
There are also a couple errors in agents:
2021-08-09T14:42:44.922025191Z level=error msg="ListenAndServe failed for service health server, since the user might be running with kube-proxy. Please ensure that '--enable-health-check-nodeport' option is set to false if '--kube-proxy-replacement' is set to 'partial'" error="listen tcp :32313: bind: address already in use" serviceName=router-default serviceNamespace=openshift-ingress subsys=service-healthserver svcHealthCheckNodePort=32313
2021-08-09T14:42:44.922093099Z level=error msg="ListenAndServe failed for service health server" error="listen tcp :32313: bind: address already in use" serviceName=router-default serviceNamespace=openshift-ingress subsys=service-healthserver svcHealthCheckNodePort=32313
I don't expect those issues would cause the errors you are seeing however. I didn't find anything else in the sysdump. Were the errors still visible in cilium status
after you retrieved the Cilium sysdump?
Yes, after getting the syslog if I run cilium status shows a daemon error again. I have tried again on a different cluster but the problem is consistent. Unsure why because OCP is able to "Validate" the quay image. `[owain@localhost ~]$ cilium status
/¯¯
/¯¯_/¯¯\ Cilium: 1 errors
_/¯¯_/ Operator: disabled
/¯¯_/¯¯\ Hubble: disabled
_/¯¯_/ ClusterMesh: disabled
__/
Containers: cilium
cilium-operator
Errors: cilium cilium daemonsets.apps "cilium" not found
`
Hi, Any updates on this?
Can anyone help point us in a direction here? The issue seems to still exist.
@v1k0d3n when I deployed cilium via OLM on OpenShift (baremetal) and I had to manually add the service accounts from cilium to the privileged SCC, but the namespace was flooded with events related to the policy issues oc get events
. I never got it fully functioning on OpenShift though due to some issues with hubble which I posted in the cilium hubble repo.
Cilium CLI insists the DS does not exist and other components are not configured but they do and are. Perhaps the cli doesn't work with OLM installations. Under supported environments in the readme it doesn't specifically say OpenShift so I'm left with the assumption it is unsupported.
Supported Environments
minikube
kind
EKS
self-managed
GKE
AKS
k3s
Rancher
$ cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: 1 errors
\__/¯¯\__/ Operator: disabled
/¯¯\__/¯¯\ Hubble: disabled
\__/¯¯\__/ ClusterMesh: disabled
\__/
Containers: cilium
cilium-operator
Cluster Pods: 0/446 managed by Cilium
Errors: cilium cilium daemonsets.apps "cilium" not found
oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
cilium 12 12 12 12 12 <none> 6h54m
$ oc get pods
NAME READY STATUS RESTARTS AGE
cilium-2vsgk 1/1 Running 0 3h43m
cilium-75sl2 1/1 Running 0 3h43m
cilium-7g92r 1/1 Running 0 3h43m
cilium-b8zc5 1/1 Running 0 3h43m
cilium-dcvv4 1/1 Running 0 3h43m
cilium-gs7f6 1/1 Running 0 3h43m
cilium-kqqdc 1/1 Running 0 3h43m
cilium-kvq27 1/1 Running 0 3h43m
cilium-olm-56b8648b4f-v8mcj 1/1 Running 0 3h43m
cilium-operator-55c9dd779d-grxcc 1/1 Running 0 3h43m
cilium-operator-55c9dd779d-kgl68 1/1 Running 0 3h43m
cilium-ptk27 1/1 Running 0 3h43m
cilium-v2p4q 1/1 Running 0 3h43m
cilium-wpl26 1/1 Running 0 3h43m
cilium-znggn 1/1 Running 0 3h43m
hubble-relay-6584f5545c-99p9n 1/1 Running 0 3h43m
hubble-ui-95d74d44c-cqsqx 3/3 Running 0 3h43m
It's probably because cilium isn't installed in the default namespace. It's necessary to provide that namespace to the CLI:
E.g.
cilium status --namespace=cilium
Yes, this is the reason. We can close this issue now.