`cilium` in `k3d` doesn't start correctly
Is there an existing issue for this?
- [X] I have searched the existing issues
What happened?
- I started a k8s cluster using k3d and docker desktop on Mac with Apple Silicon
Configuration file for the cluster looks like this:
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
name: gwapi-cilium
servers: 1
ports:
- port: 443:443
nodeFilters:
- loadbalancer
- port: 80:80
nodeFilters:
- loadbalancer
options:
k3s:
extraArgs:
- arg: "--disable=traefik"
nodeFilters:
- server:*
- arg: "--flannel-backend=none"
nodeFilters:
- server:*
- arg: "--disable-network-policy"
nodeFilters:
- server:*
Then, I install cilium with this:
docker exec -it k3d-gwapi-cilium-server-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /sys/fs/bpf
docker exec -it k3d-gwapi-cilium-server-0 mkdir -p /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount -t cgroup2 none /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /run/cilium/cgroupv2/
cilium install --version 1.15.0 --set=ipam.operator.clusterPoolIPv4PodCIDRList="10.42.0.0/16"
This ends with the cilium daemonset not starting, due to the container cilium-agent in error with this from the log:
level=warning msg="Unable to ensure that BPF JIT compilation is enabled. This can be ignored when Cilium is running inside non-host
network namespace (e.g. with kind or minikube)" error="could not open the sysctl file /host/proc/sys/net/core/bpf_jit_enable: open
/host/proc/sys/net/core/bpf_jit_enable: no such file or directory" subsys=sysctl sysParamName=net.core.bpf_jit_enable sysParamValu
e=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.all.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.fib_multipath_use_neigh sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.unprivileged_bpf_disabled sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.timer_migration sysParamValue=0
level=info msg="Re-pinning map with ':pending' suffix" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bpf/tc/globals/cilium_c
alls_overlay_2 subsys=bpf
level=info msg="Repinning without ':pending' suffix after failed migration" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bp
f/tc/globals/cilium_calls_overlay_2 subsys=bpf
level=warning msg="Removed new pinned map after failed migration" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bpf/tc/globa
ls/cilium_calls_overlay_2 subsys=bpf
level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: o
peration not supported" interface=cilium_vxlan subsys=datapath-loader
Cilium Version
cilium-cli: v0.15.21 compiled with go1.21.6 on darwin/arm64 cilium image (default): v1.14.6 cilium image (stable): v1.14.6 cilium image (running): 1.15.0
Kernel Version
docker exec -it k3d-gwapi-cilium-server-0 sh
/ # uname -a
Linux k3d-gwapi-cilium-server-0 6.5.11-linuxkit #1 SMP PREEMPT Mon Dec 4 11:30:00 UTC 2023 aarch64 GNU/Linux
Kubernetes Version
Client Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.7-dispatcher", GitCommit:"8224bc5a1d7d973ca48129a9087069d252cf6b94", GitTreeState:"clean", BuildDate:"2023-10-20T18:06:11Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4+k3s1", GitCommit:"36645e7311e9bdbbf2adb79ecd8bd68556bc86f6", GitTreeState:"clean", BuildDate:"2023-07-28T09:46:05Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/arm64"}
Sysdump
๐ Collecting sysdump with cilium-cli version: v0.15.21, args: [sysdump]
๐ฎ Detected Cilium installation in namespace "kube-system"
๐ฎ Detected Cilium operator in namespace "kube-system"
โน๏ธ Failed to detect Cilium SPIRE installation - using Cilium namespace as Cilium SPIRE namespace: kube-system
๐ Collecting Kubernetes nodes
๐ฎ Detected Cilium features: map[cidr-match-nodes:Disabled cni-chaining:Disabled:none enable-envoy-config:Disabled enable-gateway-api:Disabled enable-ipv4-egress-gateway:Disabled endpoint-routes:Disabled ingress-controller:Disabled ipv4:Enabled ipv6:Disabled mutual-auth-spiffe:Disabled wireguard-encapsulate:Disabled]
๐ Collecting tracing data from Cilium pods
๐ Collect Kubernetes nodes
๐ Collecting Kubernetes events
๐ Collect Kubernetes version
๐ Collecting Kubernetes pods
๐ Collecting Kubernetes namespaces
๐ Collecting Kubernetes services
๐ Collecting Kubernetes pods summary
๐ Collecting Kubernetes endpoints
๐ Collecting Kubernetes metrics
๐ Collecting Kubernetes network policies
๐ Collecting Kubernetes leases
๐ Collecting Cilium cluster-wide network policies
๐ Collecting Cilium network policies
๐ Collecting Cilium egress NAT policies
๐ Collecting Cilium Egress Gateway policies
๐ Collecting Cilium local redirect policies
๐ Collecting Cilium CIDR Groups
๐ Collecting Cilium endpoints
๐ Collecting Cilium endpoint slices
๐ Collecting Cilium nodes
๐ Collecting Cilium identities
๐ Collecting Ingresses
๐ Collecting Cilium Node Configs
๐ Collecting IngressClasses
๐ Collecting Cilium BGP Peering Policies
๐ Collecting Cilium LoadBalancer IP Pools
๐ Checking if cilium-etcd-secrets exists in kube-system namespace
๐ Collecting Cilium Pod IP Pools
๐ Collecting the Cilium configuration
๐ Collecting the Cilium daemonset(s)
๐ Collecting the Cilium Node Init daemonset
๐ Collecting the Cilium Envoy configuration
๐ Collecting the Cilium Envoy daemonset
๐ Collecting the Hubble daemonset
โ ๏ธ Daemonset "cilium-envoy" not found in namespace "kube-system" - this is expected if Envoy DaemonSet is not enabled
๐ Collecting the Hubble Relay configuration
๐ Collecting the Hubble Relay deployment
๐ Collecting the Hubble UI deployment
โ ๏ธ Daemonset "cilium-node-init" not found in namespace "kube-system" - this is expected if Node Init DaemonSet is not enabled
๐ Collecting the Cilium operator deployment
๐ Collecting the Cilium operator metrics
๐ Collecting the clustermesh metrics
๐ Collecting the 'clustermesh-apiserver' deployment
๐ Collecting the CNI configuration files from Cilium pods
๐ Collecting the CNI configmap
๐ Collecting gops stats from Cilium pods
๐ Collecting gops stats from Hubble pods
โ ๏ธ Deployment "hubble-relay" not found in namespace "kube-system" - this is expected if Hubble is not enabled
๐ Collecting gops stats from Hubble Relay pods
๐ Collecting bugtool output from Cilium pods
โ ๏ธ Deployment "hubble-ui" not found in namespace "kube-system" - this is expected if Hubble UI is not enabled
๐ Collecting profiling data from Cilium pods
โ ๏ธ Container "cilium-agent" for pod "cilium-qpdl9" in namespace "kube-system" is not running. Trying EphemeralContainer or separate Pod instead...
๐ Collecting logs from Cilium pods
๐ Collecting logs from Cilium Envoy pods
๐ Collecting logs from Cilium Node Init pods
๐ Collecting logs from Cilium operator pods
๐ Collecting logs from 'clustermesh-apiserver' pods
๐ Collecting logs from Hubble pods
๐ Collecting logs from Hubble Relay pods
โ ๏ธ Deployment "clustermesh-apiserver" not found in namespace "kube-system" - this is expected if 'clustermesh-apiserver' isn't enabled
๐ Collecting logs from Hubble UI pods
๐ Collecting platform-specific data
๐ Collecting kvstore data
๐ Collecting Cilium external workloads
๐ Collecting Hubble flows from Cilium pods
Secret "cilium-etcd-secrets" not found in namespace "kube-system" - this is expected when using the CRD KVStore
๐ Collecting bugtool output from Tetragon pods
I0211 21:00:28.397763 17273 request.go:697] Waited for 1.151837542s due to client-side throttling, not priority and fairness, request: GET:https://0.0.0.0:61330/api/v1/namespaces/kube-system/configmaps/cilium-envoy-config
๐ Collecting Tetragon PodInfo custom resources
๐ Collecting Tetragon tracing policies
๐ Collecting Helm values from the release
โ ๏ธ EphemeralContainer "sysdump-1707681627" on pod "" in namespace "" never reached Running status (falling back to separate Pod)
โ ๏ธ Container "cilium-agent" on pod "cilium-qpdl9" in namespace "kube-system" is not running. Creating exec Pod.
โ ๏ธ The following tasks failed, the sysdump may be incomplete:
โ ๏ธ [13] Collecting Cilium egress NAT policies: failed to collect Cilium egress NAT policies: the server could not find the requested resource
โ ๏ธ [14] Collecting Cilium Egress Gateway policies: failed to collect Cilium Egress Gateway policies: the server could not find the requested resource (get ciliumegressgatewaypolicies.cilium.io)
โ ๏ธ [16] Collecting Cilium local redirect policies: failed to collect Cilium local redirect policies: the server could not find the requested resource (get ciliumlocalredirectpolicies.cilium.io)
โ ๏ธ [18] Collecting Cilium endpoint slices: failed to collect Cilium endpoint slices: the server could not find the requested resource (get ciliumendpointslices.cilium.io)
โ ๏ธ [24] Collecting Cilium BGP Peering Policies: failed to collect Cilium BGP Peering policies: the server could not find the requested resource (get ciliumbgppeeringpolicies.cilium.io)
โ ๏ธ [31] Collecting the Cilium Envoy configuration: failed to collect the Cilium Envoy configuration: configmaps "cilium-envoy-config" not found
โ ๏ธ [34] Collecting the Hubble Relay configuration: failed to collect the Hubble Relay configuration: configmaps "hubble-relay-config" not found
โ ๏ธ cniconflist-cilium-qpdl9: error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ gops-cilium-qpdl9-memstats: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ gops-cilium-qpdl9-stack: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ gops-cilium-qpdl9-stats: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ gops-cilium-qpdl9-pprof-heap: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ gops-cilium-qpdl9-pprof-cpu: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โ ๏ธ [61] Collecting Tetragon PodInfo custom resources: failed to collect Tetragon PodInfo: the server could not find the requested resource (get podinfo.cilium.io)
โ ๏ธ hubble-flows-cilium-qpdl9: failed to collect hubble flows for "cilium-qpdl9" in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent"):
โ ๏ธ [62] Collecting Tetragon tracing policies: failed to collect Tetragon tracing policies: the server could not find the requested resource (get tracingpolicies.cilium.io)
โ ๏ธ Please note that depending on your Cilium version and installation options, this may be expected
๐ณ Compiling sysdump
โ
The sysdump has been saved to cilium-sysdump-20240211-210027.zip
cilium-sysdump-20240211-210027.zip
Relevant log output
Already shared in the first part.
Anything else?
It's a very simple and straighfoward installation I would like to setup for a presentation about cilium+GatewayAPI. I just use k3d instead of kind.
I've tried many things from the issue track or the kind documentation, without success.
This is potentially not possible to fix it, but at least this error would be a pointer for others in my situation ๐
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for this @davinkevin, I'll mark this down for the relevant team's attention and see if some has a chance to pick it up.
Hey @aanm, can you be more precise?
In fact, all from this page looks to be part of the description I did during issue creation๐, so my answer is yes, and I also used data from other issuesโฆ so it's "yes, and even more".
The steps you described to install Cilium seem different than the ones described in the documentation, that's why I was asking if you followed the documentation.
You mention usage of a yaml file to configure k3s? It's different but the result should the same.
It's a way to configure --disable=traefik, --flannel-backend=none and --disable-network-policy for k3s and port exposition for 80 & 443 in declarative way.
All docker commands are from other issues I found in this issue tracker:
docker exec -it k3d-gwapi-cilium-server-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /sys/fs/bpf
docker exec -it k3d-gwapi-cilium-server-0 mkdir -p /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount -t cgroup2 none /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /run/cilium/cgroupv2/
However, they unblocked me on errors I didn't mentionโฆ and potentially they may have caused more harm. So they could be treated as problematic.
Finally, the cillium command is from the page you mentioned with the specific ipam parameter
cilium install --version 1.15.0 --set=ipam.operator.clusterPoolIPv4PodCIDRList="10.42.0.0/16"
I also tried to gather information from the kind documentation, but I wasn't able to say if it was docker or kind specific. I definitely think some part, especially this note can be the cause of this issueโฆ but it's too low level for me.
kind and k3d are similar due to docker usage, so I also expect specificity from one to apply to the other ๐
.
Thank you for your help and support, available if you need more information!
I would suggest to try out with kind instead of k3d to see if the problem persists.
If the problem persists, I'll eventually try it with kind, but in my company (and also personally), we use k3s and k3d because they are closer to what we have in production.
At least, with this issue open, people using k3d will know cilium is not (yet) compatible ๐
Thanks
level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: o
peration not supported" interface=cilium_vxlan subsys=datapath-loader
This sounds like the kernel version in this distribution is not compiled with the minimum requirements to run Cilium. The Linux minimum requirements are listed here.
How can I check this?
From my understanding, k3d and kind on mac should use the same kernel, the one from the VM used by docker (for desktop). So if one is working, the other one should work too, no?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
๐ /unstale
@davinkevin I can see in the issue description, you used uname -a to check the kernel version used with k3d, could you compare that with uname -a from the kind environment?
Inspecting the kernel configuration may vary depending on the Linux distribution. It is often provided as a config file under /boot.
Hello ๐
With the last version of kind:
$ uname -a
Linux foo-control-plane 6.6.22-linuxkit #1 SMP Fri Mar 29 12:21:27 UTC 2024 aarch64 GNU/Linux
With the last version of k3d:
$ uname -a
Linux k3d-foo-server-0 6.6.22-linuxkit #1 SMP Fri Mar 29 12:21:27 UTC 2024 aarch64 GNU/Linux
Pretty similar ๐ค
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
๐ /unstale
@davinkevin I think the next step is to find out whether your kernel has vxlan support compiled in using the files in /boot, or work around the issue by using a different Linux distribution that has support.
In this configuration, Cilium requires vxlan support from the kernel and if it's not there, Cilium can't do anything about it. The system requirements for Cilium are listed here: https://docs.cilium.io/en/stable/operations/system_requirements/ .
As long as you continue to encounter this error message, you will need to figure out how to get a Linux environment with the correct support:
level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: operation not supported" interface=cilium_vxlan subsys=datapath-loader
Alternatively, you can install Cilium with direct routing. However, note that direct routing may impose other requirements on your network. Refer to the docs for more details.
If you are able to resolve this issue and encounter another issue with a different error, feel free to open a fresh issue with the instructions on how you encounter the issue.