cilium icon indicating copy to clipboard operation
cilium copied to clipboard

`cilium` in `k3d` doesn't start correctly

Open davinkevin opened this issue 2 years ago โ€ข 10 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

What happened?

  1. I started a k8s cluster using k3d and docker desktop on Mac with Apple Silicon

Configuration file for the cluster looks like this:

apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: gwapi-cilium
servers: 1
ports:
  - port: 443:443
    nodeFilters:
      - loadbalancer
  - port: 80:80
    nodeFilters:
      - loadbalancer
options:
  k3s:
    extraArgs:
      - arg: "--disable=traefik"
        nodeFilters:
          - server:*
      - arg: "--flannel-backend=none"
        nodeFilters:
          - server:*
      - arg: "--disable-network-policy"
        nodeFilters:
          - server:*

Then, I install cilium with this:

docker exec -it k3d-gwapi-cilium-server-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /sys/fs/bpf

docker exec -it k3d-gwapi-cilium-server-0 mkdir -p /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount -t cgroup2 none /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /run/cilium/cgroupv2/

cilium install --version 1.15.0 --set=ipam.operator.clusterPoolIPv4PodCIDRList="10.42.0.0/16"

This ends with the cilium daemonset not starting, due to the container cilium-agent in error with this from the log:

level=warning msg="Unable to ensure that BPF JIT compilation is enabled. This can be ignored when Cilium is running inside non-host
 network namespace (e.g. with kind or minikube)" error="could not open the sysctl file /host/proc/sys/net/core/bpf_jit_enable: open
 /host/proc/sys/net/core/bpf_jit_enable: no such file or directory" subsys=sysctl sysParamName=net.core.bpf_jit_enable sysParamValu
e=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.all.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.fib_multipath_use_neigh sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.unprivileged_bpf_disabled sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.timer_migration sysParamValue=0
level=info msg="Re-pinning map with ':pending' suffix" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bpf/tc/globals/cilium_c
alls_overlay_2 subsys=bpf
level=info msg="Repinning without ':pending' suffix after failed migration" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bp
f/tc/globals/cilium_calls_overlay_2 subsys=bpf
level=warning msg="Removed new pinned map after failed migration" bpfMapName=cilium_calls_overlay_2 bpfMapPath=/sys/fs/bpf/tc/globa
ls/cilium_calls_overlay_2 subsys=bpf
level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: o
peration not supported" interface=cilium_vxlan subsys=datapath-loader

Cilium Version

cilium-cli: v0.15.21 compiled with go1.21.6 on darwin/arm64 cilium image (default): v1.14.6 cilium image (stable): v1.14.6 cilium image (running): 1.15.0

Kernel Version

docker exec -it k3d-gwapi-cilium-server-0 sh
/ # uname -a
Linux k3d-gwapi-cilium-server-0 6.5.11-linuxkit #1 SMP PREEMPT Mon Dec  4 11:30:00 UTC 2023 aarch64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.7-dispatcher", GitCommit:"8224bc5a1d7d973ca48129a9087069d252cf6b94", GitTreeState:"clean", BuildDate:"2023-10-20T18:06:11Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4+k3s1", GitCommit:"36645e7311e9bdbbf2adb79ecd8bd68556bc86f6", GitTreeState:"clean", BuildDate:"2023-07-28T09:46:05Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/arm64"}

Sysdump

๐Ÿ” Collecting sysdump with cilium-cli version: v0.15.21, args: [sysdump]
๐Ÿ”ฎ Detected Cilium installation in namespace "kube-system"
๐Ÿ”ฎ Detected Cilium operator in namespace "kube-system"
โ„น๏ธ Failed to detect Cilium SPIRE installation - using Cilium namespace as Cilium SPIRE namespace: kube-system
๐Ÿ” Collecting Kubernetes nodes
๐Ÿ”ฎ Detected Cilium features: map[cidr-match-nodes:Disabled cni-chaining:Disabled:none enable-envoy-config:Disabled enable-gateway-api:Disabled enable-ipv4-egress-gateway:Disabled endpoint-routes:Disabled ingress-controller:Disabled ipv4:Enabled ipv6:Disabled mutual-auth-spiffe:Disabled wireguard-encapsulate:Disabled]
๐Ÿ” Collecting tracing data from Cilium pods
๐Ÿ” Collect Kubernetes nodes
๐Ÿ” Collecting Kubernetes events
๐Ÿ” Collect Kubernetes version
๐Ÿ” Collecting Kubernetes pods
๐Ÿ” Collecting Kubernetes namespaces
๐Ÿ” Collecting Kubernetes services
๐Ÿ” Collecting Kubernetes pods summary
๐Ÿ” Collecting Kubernetes endpoints
๐Ÿ” Collecting Kubernetes metrics
๐Ÿ” Collecting Kubernetes network policies
๐Ÿ” Collecting Kubernetes leases
๐Ÿ” Collecting Cilium cluster-wide network policies
๐Ÿ” Collecting Cilium network policies
๐Ÿ” Collecting Cilium egress NAT policies
๐Ÿ” Collecting Cilium Egress Gateway policies
๐Ÿ” Collecting Cilium local redirect policies
๐Ÿ” Collecting Cilium CIDR Groups
๐Ÿ” Collecting Cilium endpoints
๐Ÿ” Collecting Cilium endpoint slices
๐Ÿ” Collecting Cilium nodes
๐Ÿ” Collecting Cilium identities
๐Ÿ” Collecting Ingresses
๐Ÿ” Collecting Cilium Node Configs
๐Ÿ” Collecting IngressClasses
๐Ÿ” Collecting Cilium BGP Peering Policies
๐Ÿ” Collecting Cilium LoadBalancer IP Pools
๐Ÿ” Checking if cilium-etcd-secrets exists in kube-system namespace
๐Ÿ” Collecting Cilium Pod IP Pools
๐Ÿ” Collecting the Cilium configuration
๐Ÿ” Collecting the Cilium daemonset(s)
๐Ÿ” Collecting the Cilium Node Init daemonset
๐Ÿ” Collecting the Cilium Envoy configuration
๐Ÿ” Collecting the Cilium Envoy daemonset
๐Ÿ” Collecting the Hubble daemonset
โš ๏ธ Daemonset "cilium-envoy" not found in namespace "kube-system" - this is expected if Envoy DaemonSet is not enabled
๐Ÿ” Collecting the Hubble Relay configuration
๐Ÿ” Collecting the Hubble Relay deployment
๐Ÿ” Collecting the Hubble UI deployment
โš ๏ธ Daemonset "cilium-node-init" not found in namespace "kube-system" - this is expected if Node Init DaemonSet is not enabled
๐Ÿ” Collecting the Cilium operator deployment
๐Ÿ” Collecting the Cilium operator metrics
๐Ÿ” Collecting the clustermesh metrics
๐Ÿ” Collecting the 'clustermesh-apiserver' deployment
๐Ÿ” Collecting the CNI configuration files from Cilium pods
๐Ÿ” Collecting the CNI configmap
๐Ÿ” Collecting gops stats from Cilium pods
๐Ÿ” Collecting gops stats from Hubble pods
โš ๏ธ Deployment "hubble-relay" not found in namespace "kube-system" - this is expected if Hubble is not enabled
๐Ÿ” Collecting gops stats from Hubble Relay pods
๐Ÿ” Collecting bugtool output from Cilium pods
โš ๏ธ Deployment "hubble-ui" not found in namespace "kube-system" - this is expected if Hubble UI is not enabled
๐Ÿ” Collecting profiling data from Cilium pods
โš ๏ธ Container "cilium-agent" for pod "cilium-qpdl9" in namespace "kube-system" is not running. Trying EphemeralContainer or separate Pod instead...
๐Ÿ” Collecting logs from Cilium pods
๐Ÿ” Collecting logs from Cilium Envoy pods
๐Ÿ” Collecting logs from Cilium Node Init pods
๐Ÿ” Collecting logs from Cilium operator pods
๐Ÿ” Collecting logs from 'clustermesh-apiserver' pods
๐Ÿ” Collecting logs from Hubble pods
๐Ÿ” Collecting logs from Hubble Relay pods
โš ๏ธ Deployment "clustermesh-apiserver" not found in namespace "kube-system" - this is expected if 'clustermesh-apiserver' isn't enabled
๐Ÿ” Collecting logs from Hubble UI pods
๐Ÿ” Collecting platform-specific data
๐Ÿ” Collecting kvstore data
๐Ÿ” Collecting Cilium external workloads
๐Ÿ” Collecting Hubble flows from Cilium pods
Secret "cilium-etcd-secrets" not found in namespace "kube-system" - this is expected when using the CRD KVStore
๐Ÿ” Collecting bugtool output from Tetragon pods
I0211 21:00:28.397763   17273 request.go:697] Waited for 1.151837542s due to client-side throttling, not priority and fairness, request: GET:https://0.0.0.0:61330/api/v1/namespaces/kube-system/configmaps/cilium-envoy-config
๐Ÿ” Collecting Tetragon PodInfo custom resources
๐Ÿ” Collecting Tetragon tracing policies
๐Ÿ” Collecting Helm values from the release
โš ๏ธ EphemeralContainer "sysdump-1707681627" on pod "" in namespace "" never reached Running status (falling back to separate Pod)
โš ๏ธ Container "cilium-agent" on pod "cilium-qpdl9" in namespace "kube-system" is not running. Creating exec Pod.
โš ๏ธ The following tasks failed, the sysdump may be incomplete:
โš ๏ธ [13] Collecting Cilium egress NAT policies: failed to collect Cilium egress NAT policies: the server could not find the requested resource
โš ๏ธ [14] Collecting Cilium Egress Gateway policies: failed to collect Cilium Egress Gateway policies: the server could not find the requested resource (get ciliumegressgatewaypolicies.cilium.io)
โš ๏ธ [16] Collecting Cilium local redirect policies: failed to collect Cilium local redirect policies: the server could not find the requested resource (get ciliumlocalredirectpolicies.cilium.io)
โš ๏ธ [18] Collecting Cilium endpoint slices: failed to collect Cilium endpoint slices: the server could not find the requested resource (get ciliumendpointslices.cilium.io)
โš ๏ธ [24] Collecting Cilium BGP Peering Policies: failed to collect Cilium BGP Peering policies: the server could not find the requested resource (get ciliumbgppeeringpolicies.cilium.io)
โš ๏ธ [31] Collecting the Cilium Envoy configuration: failed to collect the Cilium Envoy configuration: configmaps "cilium-envoy-config" not found
โš ๏ธ [34] Collecting the Hubble Relay configuration: failed to collect the Hubble Relay configuration: configmaps "hubble-relay-config" not found
โš ๏ธ cniconflist-cilium-qpdl9: error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ gops-cilium-qpdl9-memstats: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ gops-cilium-qpdl9-stack: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ gops-cilium-qpdl9-stats: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ gops-cilium-qpdl9-pprof-heap: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ gops-cilium-qpdl9-pprof-cpu: failed to list processes "cilium-qpdl9" ("cilium-agent") in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent")
โš ๏ธ [61] Collecting Tetragon PodInfo custom resources: failed to collect Tetragon PodInfo: the server could not find the requested resource (get podinfo.cilium.io)
โš ๏ธ hubble-flows-cilium-qpdl9: failed to collect hubble flows for "cilium-qpdl9" in namespace "kube-system": error with exec request (pod=kube-system/cilium-qpdl9, container=cilium-agent): unable to upgrade connection: container not found ("cilium-agent"):
โš ๏ธ [62] Collecting Tetragon tracing policies: failed to collect Tetragon tracing policies: the server could not find the requested resource (get tracingpolicies.cilium.io)
โš ๏ธ Please note that depending on your Cilium version and installation options, this may be expected
๐Ÿ—ณ Compiling sysdump
โœ… The sysdump has been saved to cilium-sysdump-20240211-210027.zip

cilium-sysdump-20240211-210027.zip

Relevant log output

Already shared in the first part.

Anything else?

It's a very simple and straighfoward installation I would like to setup for a presentation about cilium+GatewayAPI. I just use k3d instead of kind.

I've tried many things from the issue track or the kind documentation, without success.

This is potentially not possible to fix it, but at least this error would be a pointer for others in my situation ๐Ÿ˜‡

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

davinkevin avatar Feb 11 '24 20:02 davinkevin

Thanks for this @davinkevin, I'll mark this down for the relevant team's attention and see if some has a chance to pick it up.

youngnick avatar Feb 13 '24 05:02 youngnick

Hey,

have you tried following the steps described in here?

aanm avatar Feb 19 '24 08:02 aanm

Hey @aanm, can you be more precise?

In fact, all from this page looks to be part of the description I did during issue creation๐Ÿ˜‡, so my answer is yes, and I also used data from other issuesโ€ฆ so it's "yes, and even more".

davinkevin avatar Feb 19 '24 09:02 davinkevin

The steps you described to install Cilium seem different than the ones described in the documentation, that's why I was asking if you followed the documentation.

aanm avatar Feb 19 '24 15:02 aanm

You mention usage of a yaml file to configure k3s? It's different but the result should the same.

It's a way to configure --disable=traefik, --flannel-backend=none and --disable-network-policy for k3s and port exposition for 80 & 443 in declarative way.

All docker commands are from other issues I found in this issue tracker:

docker exec -it k3d-gwapi-cilium-server-0 mount bpffs /sys/fs/bpf -t bpf
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /sys/fs/bpf

docker exec -it k3d-gwapi-cilium-server-0 mkdir -p /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount -t cgroup2 none /run/cilium/cgroupv2
docker exec -it k3d-gwapi-cilium-server-0 mount --make-shared /run/cilium/cgroupv2/

However, they unblocked me on errors I didn't mentionโ€ฆ and potentially they may have caused more harm. So they could be treated as problematic.

Finally, the cillium command is from the page you mentioned with the specific ipam parameter

cilium install --version 1.15.0 --set=ipam.operator.clusterPoolIPv4PodCIDRList="10.42.0.0/16"

I also tried to gather information from the kind documentation, but I wasn't able to say if it was docker or kind specific. I definitely think some part, especially this note can be the cause of this issueโ€ฆ but it's too low level for me.

kind and k3d are similar due to docker usage, so I also expect specificity from one to apply to the other ๐Ÿ˜….

Thank you for your help and support, available if you need more information!

davinkevin avatar Feb 19 '24 16:02 davinkevin

I would suggest to try out with kind instead of k3d to see if the problem persists.

aanm avatar Feb 20 '24 11:02 aanm

If the problem persists, I'll eventually try it with kind, but in my company (and also personally), we use k3s and k3d because they are closer to what we have in production.

At least, with this issue open, people using k3d will know cilium is not (yet) compatible ๐Ÿ˜‡

Thanks

davinkevin avatar Feb 20 '24 12:02 davinkevin

level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: o
peration not supported" interface=cilium_vxlan subsys=datapath-loader

This sounds like the kernel version in this distribution is not compiled with the minimum requirements to run Cilium. The Linux minimum requirements are listed here.

joestringer avatar Mar 05 '24 22:03 joestringer

How can I check this?

From my understanding, k3d and kind on mac should use the same kernel, the one from the VM used by docker (for desktop). So if one is working, the other one should work too, no?

davinkevin avatar Mar 05 '24 22:03 davinkevin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar May 05 '24 01:05 github-actions[bot]

๐Ÿ‘‹ /unstale

davinkevin avatar May 05 '24 22:05 davinkevin

@davinkevin I can see in the issue description, you used uname -a to check the kernel version used with k3d, could you compare that with uname -a from the kind environment?

Inspecting the kernel configuration may vary depending on the Linux distribution. It is often provided as a config file under /boot.

joestringer avatar May 06 '24 17:05 joestringer

Hello ๐Ÿ‘‹

With the last version of kind:

$ uname -a
Linux foo-control-plane 6.6.22-linuxkit #1 SMP Fri Mar 29 12:21:27 UTC 2024 aarch64 GNU/Linux

With the last version of k3d:

$ uname -a
Linux k3d-foo-server-0 6.6.22-linuxkit #1 SMP Fri Mar 29 12:21:27 UTC 2024 aarch64 GNU/Linux

Pretty similar ๐Ÿค”

davinkevin avatar May 14 '24 19:05 davinkevin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Jul 14 '24 01:07 github-actions[bot]

๐Ÿ‘‹ /unstale

davinkevin avatar Jul 14 '24 18:07 davinkevin

@davinkevin I think the next step is to find out whether your kernel has vxlan support compiled in using the files in /boot, or work around the issue by using a different Linux distribution that has support.

In this configuration, Cilium requires vxlan support from the kernel and if it's not there, Cilium can't do anything about it. The system requirements for Cilium are listed here: https://docs.cilium.io/en/stable/operations/system_requirements/ .

As long as you continue to encounter this error message, you will need to figure out how to get a Linux environment with the correct support:

level=fatal msg="Load overlay network failed" error="program cil_from_overlay: replacing clsact qdisc for interface cilium_vxlan: operation not supported" interface=cilium_vxlan subsys=datapath-loader

Alternatively, you can install Cilium with direct routing. However, note that direct routing may impose other requirements on your network. Refer to the docs for more details.

If you are able to resolve this issue and encounter another issue with a different error, feel free to open a fresh issue with the instructions on how you encounter the issue.

joestringer avatar Jul 15 '24 17:07 joestringer