cilium-service-mesh-beta
cilium-service-mesh-beta copied to clipboard
Installation steps using helm
Is there an existing issue for this?
- [X] I have searched the existing issues
What happened?
- In digital ocean tried to install using
cilium install --version -service-mesh:v1.11.0-beta.1 --config enable-envoy-config=true --kube-proxy-replacement=probe
but getting errors like
controller endpoint-769-regeneration-recovery is failing since 37s (24x): regeneration recovery failed
- Even tried
cilium uninstalland simplecilium install --kube-proxy-replacement=probe, but that also gave same error. - Then tried simply
helm install cilium cilium/cilium \
--version 1.11.0 \
--namespace kube-system
and this went fine.
Cilium Version
1.11.0
Kernel Version
NA
Kubernetes Version
1.21.5
Sysdump
Uploading cilium-sysdump-20220110-102642.zip…
Relevant log output
No response
Anything else?
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for the report!
Could you try to upload the Cilium sysdump again? It seems you submitted the issue before uploading was finished.
Trial 2: EKS
tried installing using helm
helm upgrade --install cilium cilium/cilium --version=1.11.0 \
--namespace kube-system --set eni.enabled=true \
--set ipam.mode=eni --set egressMasqueradeInterfaces=eth0 \
--set loadBalancer.algorithm=maglev --set hubble.enabled=true \
--set hubble.relay.enabled=true --set hubble.ui.enabled=false \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" \
--set kubeProxyReplacement="strict" \
--set k8sServiceHost=$API_SERVER_IP --set k8sServicePort=443 \
--set-string extraConfig.enable-envoy-config="true" \
--set image.repository=quay.io/cilium/cilium-service-mesh \
--set image.tag=v1.11.0-beta.1 \
--set image.useDigest=false \
--set operator.image.suffix=-service-mesh \
--set operator.image.useDigest=false \
--set operator.replicas=1 \
--set operator.image.tag=v1.11.0-beta.1
Got below error. Helm need rbac update and there seems other bpf issues too.
level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=40.842791ms bpfWaitForELF="3.806µs" bpfWriteELF="697.761µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.285µs" policyCalculation="3.206µs" prepareBuild="623.979µs" proxyConfiguration="7.414µs" proxyPolicyCalculation="2.816µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=43.733597ms waitingForCTClean=201ns waitingForLock=773ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/cilium_clusterwide_network_policy.go:93: failed to list *v2.CiliumClusterwideNetworkPolicy: ciliumclusterwidenetworkpolicies.cilium.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"ciliumclusterwidenetworkpolicies\" in API group \"cilium.io\" at the cluster scope" subsys=klog
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/watchers/cilium_clusterwide_network_policy.go:93: Failed to watch *v2.CiliumClusterwideNetworkPolicy: failed to list *v2.CiliumClusterwideNetworkPolicy: ciliumclusterwidenetworkpolicies.cilium.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"ciliumclusterwidenetworkpolicies\" in API group \"cilium.io\" at the cluster scope" subsys=k8s
level=warning msg="Unable to update CiliumNode custom resource" error="ciliumnodes.cilium.io \"ip-192-168-113-75.ec2.internal\" is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot update resource \"ciliumnodes/status\" in API group \"cilium.io\" at the cluster scope" subsys=ipam
level=warning msg="github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:143: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" at the cluster scope" subsys=klog
level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:143: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"endpointslices\" in API group \"discovery.k8s.io\" at the cluster scope" subsys=k8s
level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=55.841988ms bpfWaitForELF="4.595µs" bpfWriteELF="745.463µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.409µs" policyCalculation="6.682µs" prepareBuild="598.265µs" proxyConfiguration="7.357µs" proxyPolicyCalculation="2.824µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=60.28447ms waitingForCTClean=197ns waitingForLock=836ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 1979_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_01979': parameter mismatch" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': error reusing pinned map" subsys=datapath-loader
level=warning msg="libbpf: map 'cilium_calls_hostns_01979': failed to create: Invalid argument(-22)" subsys=datapath-loader
level=warning msg="libbpf: failed to load object '1979_next/bpf_host.o'" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" file-path=1979_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 file-path=1979_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=62.441743ms bpfWaitForELF="4.154µs" bpfWriteELF="840.205µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.732µs" policyCalculation="3.232µs" prepareBuild="795.148µs" proxyConfiguration="7.916µs" proxyPolicyCalculation="3.243µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=66.318589ms waitingForCTClean=208ns waitingForLock="1.053µs"
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=1979 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
I received the same errors in both installation methods on a 1.21 EKS cluster.
Same issue here on a bare-metal cluster. @pchaigno did you get a sysdump? If not I can share one privately with you.
@ghouscht I didn't receive a sysdump yet. If you could share one, that would help as it would allow us to confirm this is a complexity issue caused by the lack of kernel support for KPR. I'm pchaigno on Slack as well.
same for me on Azure:
cilium install \
--context xxxxxx \
--cluster-name xxxxxx \
--cluster-id 1 \
--azure-resource-group xxxxxx \
--azure-subscription-id xxxxx \
--azure-client-id xxxxx \
--azure-client-secret xxxxxx \
--azure-tenant-id xxxxxx \
--version -service-mesh:v1.11.0-beta.1 \
--config enable-envoy-config=true \
--kube-proxy-replacement=probe
results in
cilium-qj48r cilium-agent level=error msg="Command execution failed" cmd="[tc filter replace dev cilium_host ingress prio 1 handle 1 bpf da obj 3826_next/bpf_host.o sec to-host]" error="exit status 1" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/cilium_calls_hostns_03826': parameter mismatch" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: map 'cilium_calls_hostns_03826': error reusing pinned map" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: map 'cilium_calls_hostns_03826': failed to create: Invalid argument(-22)" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="libbpf: failed to load object '3826_next/bpf_host.o'" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="Unable to load program" subsys=datapath-loader
cilium-qj48r cilium-agent level=warning msg="JoinEP: Failed to load program for host endpoint (to-host)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" file-path=3826_next/bpf_host.o identity=1 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=cilium_host
cilium-qj48r cilium-agent level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
cilium-qj48r cilium-agent level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 file-path=3826_next_fail identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
cilium-qj48r cilium-agent level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=35.925896ms bpfWaitForELF="3.8µs" bpfWriteELF="769.615µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ mapSync="2.4µs" policyCalculation="3.8µs" prepareBuild="563.111µs" proxyConfiguration="9.301µs" proxyPolicyCalculation="4µs" proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=38.809751ms waitingForCTClean=300ns waitingForLock=900ns
cilium-qj48r cilium-agent level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=2 endpointID=3826 error="Failed to load prog with tc: exit status 1" identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
A previous installation with Cilium 1.11.1 went fine on the same cluster (AKS 1.21.7).
cc @jrajahalme
I assume this is because of reusing AKS clusters that have been in clustermesh mode previously. Clustermesh has been disabled before but probably some settings still exist.
I had similar issue. I got following errors:
2022-06-25T00:32:12.685918484Z level=error msg="Command execution failed" cmd="[ip -force link set dev eth0 xdpgeneric obj /var/run/cilium/state/bpf_xdp.o sec from-netdev]" error="exit status 255" subsys=datapath-loader
2022-06-25T00:32:12.685965305Z level=warning msg="libbpf: couldn't reuse pinned map at '/sys/fs/bpf/xdp//globals/cilium_calls_xdp': parameter mismatch" subsys=datapath-loader
2022-06-25T00:32:12.685972334Z level=warning msg="libbpf: map 'cilium_calls_xdp': error reusing pinned map" subsys=datapath-loader
2022-06-25T00:32:12.685977404Z level=warning msg="libbpf: map 'cilium_calls_xdp': failed to create: Invalid argument(-22)" subsys=datapath-loader
2022-06-25T00:32:12.685981737Z level=warning msg="libbpf: failed to load object '/var/run/cilium/state/bpf_xdp.o'" subsys=datapath-loader
2022-06-25T00:32:12.694436571Z level=fatal msg="Failed to compile XDP program" error="Failed to load prog with ip: exit status 255" subsys=datapath-loader
2022-06-25T00:32:14.062967388Z level=info msg="regenerating all endpoints" reason="kube-apiserver identity updated" subsys=endpoint-manager
This happened when I downgraded cilium from a newer version to old v1.11.1. And only happened when I enable xdp via bpf-lb-acceleration: testing-only
I have two nodes. I reload one node and it recovered from the error. I tried to get sysdump ( I guess now it calls debuginfo?) . But I can only get it from the recovered cilium pod. For the crashing one, I cannot get it since it keeps crashing. I uploaded the file here anyway.