nerdctl icon indicating copy to clipboard operation
nerdctl copied to clipboard

[Cilium] Executing nerdctl run in k8 environment is stuck

Open wzxmt opened this issue 11 months ago • 26 comments

Description

Executing nerdctl run in the k8 environment is stuck, but k8s can create pods normally

Steps to reproduce the issue

1.[root@m1 ~]# nerdctl ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f0571a9094ce quay.io/cilium/hubble-ui-backend@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b "/usr/bin/backend" 6 minutes ago Up k8s://kube-cilium/hubble-ui-77555d5dcf-pj77v/backend 046ba04231f7 docker.io/wangyanglinux/myapp:v1 "nginx -g daemon off;" 6 minutes ago Up k8s://default/test-z2gms/test 5c6c52541c37 docker.io/wzxmtlw/metrics-server:v0.6.3 "/metrics-server --c…" 6 minutes ago Up k8s://kube-system/metrics-server-5c7b6df7d8-md58r/metrics-server fcb24a33d77a quay.io/cilium/hubble-relay@sha256:d352d3860707e8d734a0b185ff69e30b3ffd630a7ec06ba6a4402bed64b4456c "hubble-relay serve" 7 minutes ago Up k8s://kube-cilium/hubble-relay-7bc7544857-95dqm/hubble-relay ....

2.[root@m1 ~]# nerdctl run --name test --rm -it busybox:1.28 /bin/sh
Executing the above command gets stuck

3.Can nerdctl run be executed outside the k8s environment

Describe the results you received and expected

null

What version of nerdctl are you using?

[root@m1 ~]# nerdctl version Client: Version: v2.0.2 OS/Arch: linux/amd64 Git commit: 1220ce7ec2701d485a9b1beeea63dae3da134fb5 buildctl: Version: v0.17.1 GitCommit: 8b1b83ef4947c03062cdcdb40c69989d8fe3fd04

Server: containerd: Version: v2.0.1 GitCommit: 88aa2f531d6c2922003cc7929e51daf1c14caa0a runc: Version: 1.2.2 GitCommit: v1.2.2-0-g7cb36325

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

[root@m1 ~]# nerdctl info Client: Namespace: k8s.io Debug Mode: false

Server: Server Version: v2.0.1 Storage Driver: overlayfs Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Log: fluentd journald json-file none syslog Storage: native overlayfs Security Options: seccomp Profile: builtin cgroupns Kernel Version: 5.14.0-427.13.1.el9_4.x86_64 Operating System: Rocky Linux 9.4 (Blue Onyx) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 3.793GiB Name: m1 ID: b26f2865-ca8a-49fa-a3a2-ec66adae9813

wzxmt avatar Dec 20 '24 02:12 wzxmt

[root@m1 ~]# kubectl version Client Version: v1.31.4 Kustomize Version: v5.4.2 Server Version: v1.31.4

wzxmt avatar Dec 20 '24 02:12 wzxmt

@wzxmt I am not sure how to reproduce your problem.

Against a kind cluster, things are working just fine / as expected.

I need more details about your specific deployment.

  • How can I reproduce it from scratch?
  • How did you create your kube cluster exactly?
  • What else is involved here?
  • What are your containerd details?
  • re-run the failing/stuck nerdctl command with --debug-full

apostasie avatar Dec 20 '24 19:12 apostasie

@wzxmt I am not sure how to reproduce your problem.

Against a kind cluster, things are working just fine / as expected.

I need more details about your specific deployment.

  • How can I reproduce it from scratch?
  • How did you create your kube cluster exactly?
  • What else is involved here?
  • What are your containerd details?
  • re-run the failing/stuck nerdctl command with --debug-full

My K8s deployment method uses binary deployment, and I tried again. Running "nerdctl run --name test --rm -it busybox:1.28 /bin/sh" in Flannel mode works without any stutter, but it stutters in Cilium mode. Here are the deployment modes:

linux-amd64/helm template cilium cilium/cilium --version 1.15.11
--namespace kube-cilium
--set operator.replicas=1
--set k8sServiceHost=apiserver.cluster.local
--set k8sServicePort=8443
--set ipv4NativeRoutingCIDR=172.16.0.0/16
--set ipam.operator.clusterPoolIPv4PodCIDRList=172.16.0.0/16
--set hubble.relay.enabled=true
--set hubble.ui.enabled=true
--set hubble.ui.service.type=NodePort
--set hubble.ui.service.nodePort=31235
--set routing-mode=native
--set kubeProxyReplacement=strict
--set bpf.masquerade=true
--set bandwidthManager.enabled=true >>${HOST_PATH}/roles/components/templates/cilium.yaml

[root@m1 ~]# containerd -v containerd github.com/containerd/containerd/v2 v2.0.1 88aa2f531d6c2922003cc7929e51daf1c14caa0a

[root@m1 ~]# nerdctl info Client: Namespace: k8s.io Debug Mode: false

Server: Server Version: v2.0.1 Storage Driver: overlayfs Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Log: fluentd journald json-file none syslog Storage: native overlayfs Security Options: seccomp Profile: builtin cgroupns Kernel Version: 5.14.0-427.13.1.el9_4.x86_64 Operating System: Rocky Linux 9.4 (Blue Onyx) OSType: linux Architecture: x86_64 CPUs: 6 Total Memory: 5.755GiB Name: m1 ID: 97bb3274-41ea-4a43-a74b-7dc0b86e3fa9

[root@m1 ~]# nerdctl run --name test --rm -it --debug-full busybox:1.28 /bin/sh DEBU[0000] verifying process skipped DEBU[0000] generated log driver: binary:///apps/containerd/bin/nerdctl?_NERDCTL_INTERNAL_LOGGING=%2Fvar%2Flib%2Fnerdctl%2F1935db59

wzxmt avatar Dec 21 '24 16:12 wzxmt

Thanks @wzxmt

What happens with nerdctl network ls, or when starting your container with different networking options? (eg: --net host)

@AkihiroSuda anyone around familiar with Kube + eBPF/Cillium who could help debug this?

apostasie avatar Dec 21 '24 19:12 apostasie

nerdctl network ls

I later tried the Calico mode and it worked fine. Running "nerdctl network ls" in Cilium mode still stutters, but other modes can be executed normally.

flannel

[root@m2 ~]# nerdctl network ls NETWORK ID NAME FILE cbr0 /etc/cni/net.d/10-flannel.conflist 17f29b073143 bridge /etc/cni/net.d/nerdctl-bridge.conflist host none

calico

[root@m3 ~]# nerdctl network ls NETWORK ID NAME FILE k8s-pod-network /etc/cni/net.d/10-calico.conflist 17f29b073143 bridge /etc/cni/net.d/nerdctl-bridge.conflist host none

Cilium stutters

[root@m1 ~]# nerdctl network ls

wzxmt avatar Dec 22 '24 05:12 wzxmt

nerdctl network ls

I later tried the Calico mode and it worked fine. Running "nerdctl network ls" in Cilium mode still stutters, but other modes can be executed normally.

Interesting.

Staying stuck is rather unusual. What I am thinking is locking on the same directory. Been browsing Cilium source code, and indeed they do use filesystem locking, possibly on the same directory as us.

@wzxmt if you feel like it, the most helpful thing you could do is:

# clone nerdctl source code
git clone [email protected]:containerd/nerdctl.git
cd nerdctl

# Edit https://github.com/containerd/nerdctl/blob/main/pkg/netutil/netutil.go#L224
# Line 224, find this:
#	err = lockutil.WithDirLock(e.NetconfPath, fn)
# Replace it with:
#      fn()

# Compile a new nerdctl binary
make binaries

# The updated binary is under `_output`

# Now, try again
_output/nerdctl network ls

If it still does not help, you could pepper fmt.Println("debug message something") in this function (and the caller) to figure out where it is getting stuck.

I wish I could test Cilium but I am short on time right now.

Thanks @wzxmt

apostasie avatar Dec 22 '24 07:12 apostasie

nerdctl network ls

I later tried the Calico mode and it worked fine. Running "nerdctl network ls" in Cilium mode still stutters, but other modes can be executed normally.

Interesting.

Staying stuck is rather unusual. What I am thinking is locking on the same directory. Been browsing Cilium source code, and indeed they do use filesystem locking, possibly on the same directory as us.

@wzxmt if you feel like it, the most helpful thing you could do is:

clone nerdctl source code

git clone [email protected]:containerd/nerdctl.git cd nerdctl

Edit https://github.com/containerd/nerdctl/blob/main/pkg/netutil/netutil.go#L224

Line 224, find this:

err = lockutil.WithDirLock(e.NetconfPath, fn)

Replace it with:

fn()

Compile a new nerdctl binary

make binaries

The updated binary is under _output

Now, try again

_output/nerdctl network ls If it still does not help, you could pepper fmt.Println("debug message something") in this function (and the caller) to figure out where it is getting stuck.

I wish I could test Cilium but I am short on time right now.

Thanks @wzxmt

Edit https://github.com/containerd/nerdctl/blob/main/pkg/netutil/netutil.go#L224,make binaries You can execute nerdctl network ls, and execute nerdctl run --name test --rm -it --debug-full busybox:1.28 /bin/sh but there is still a problem Image

wzxmt avatar Dec 23 '24 01:12 wzxmt

Thanks a lot @wzxmt

I think this confirms what the issue is: cilium is very likely trying to lock the same directory as nerdctl (likely the cni configuration directory).

The problem here will not be trivial to solve.

We need to flock when accessing the cni conf - this is the only way to prevent racy/concurrent modifications.

What we could do is move the lock to a different location though (purely nerdctl).

cc @AkihiroSuda

apostasie avatar Dec 23 '24 03:12 apostasie

I got another issue:

containerd version: 1.7.24 nerdctl version: 1.7.7 (v2.0.2 has also tried before,it's the same)

i have 2 cni configure:

  • 10-bridge.conflist : it is for k8s, use bridge plugin, the content is:
{
"cniVersion": "1.0.0",
"name": "k8s-net",
"plugins": [
  {
    "type": "bridge",
    "bridge": "cni1",
    "isGateway": true,
    "isDefaultGateway": true,
    "ipMasq": false,
    "mtu": 1360,
    "hairpinMode": true,
    "ipam": {
      "ranges": [
        [
          {
            "subnet": "10.129.32.0/24",
            "rangeStart": "10.129.32.1",
            "rangeEnd": "10.129.32.126"
          }
        ]
      ],
      "type": "host-local"
    }
  },
  {
    "type": "bandwidth"
  },
  {
    "type": "firewall"
  },
  {
    "type": "tuning"
  }
]
}
  • nerdctl-nerd.conflist: it was create by nerdctl and modified it. the content is:
{
"cniVersion": "1.0.0",
"name": "nerd",
"nerdctlID": "5cabaa953bd37c3e357e779bb82aa195eda3b2afa2bdd19594a7162c4f7497be",
"nerdctlLabels": {},
"plugins": [
  {
    "name": "cni0",
    "type": "macvlan",
    "master": "bond0",
    "mtu": 1360,
    "ipam": {
      "ranges": [
        [
          {
            "gateway": "10.129.17.1",
            "rangeStart": "10.129.17.24",
            "rangeEnd": "10.129.17.63",
            "subnet": "10.129.17.0/24"
          }
        ]
      ],
      "routes": [
        { "dst": "0.0.0.0/0", "gw": "10.129.17.1" }
      ],
      "type": "host-local"
    }
  }
]
}

the k8s works well. but when i used nerdctl to create container and start it,

nerdctl create --name=etcd-openebs --restart=always \
    --network=nerd --ip=10.129.17.25 \
    --cpus=4.0 --memory=8092 --memory-swap=0 \
    --log-driver=json-file \
    --log-opt=max-size=500m \
    --log-opt=max-file=5 \
    --log-opt=log-path=${LOGSDIR}/etcd.log \
    -e ETCD_NAME=${ETCD_NAME} \
    -v ${CONFDIR}:/etc/etcd \
    -v ${DATADIR}:/data/etcd \
    ${CONTAINER_IMAGE} \
    /usr/local/bin/etcd --config-file /etc/etcd/etcd.yml

nerdctl start etcd-openebs

it failed, and got

FATA[0000] 1 errors:
failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: time="2025-02-20T09:15:53+08:00" level=fatal msg="failed to call cni.Setup: plugin type=\"macvlan\" name=\"cni0\" failed (add): failed to allocate for range 0: requested IP address 10.129.17.25 is not available in range set 10.129.17.24-10.129.17.63"
Failed to write to log, write /var/lib/nerdctl/1935db59/containers/default/b5c3b84c7cc382d563954d684ebb766bc7a36b2bade55e91adb0a89d0533f77c/oci-hook.createRuntime.log: file already closed: unknown 

nopeno avatar Feb 20 '25 01:02 nopeno

requested IP address 10.129.17.25 is not available in range set 10.129.17.24-10.129.17.63

Doesn't seem relevant to the OP

etcd-openebs

etcd and openebs do not need to be used to reproduce the issue?

AkihiroSuda avatar Feb 20 '25 01:02 AkihiroSuda

requested IP address 10.129.17.25 is not available in range set 10.129.17.24-10.129.17.63

Doesn't seem relevant to the OP

etcd-openebs

etcd and openebs do not need to be used to reproduce the issue?

etcd-openebs is the name of my created containerd...

nopeno avatar Feb 20 '25 07:02 nopeno

@AkihiroSuda OP issue is clearly that we lock a directory that Cilium is also trying to lock.

I believe nerdctl should implement locking for networking stuff in a separate, private directory.

Can you assign this to me?

apostasie avatar Mar 05 '25 03:03 apostasie

Thanks, can we just create a lock file like .nerdctl.lock in the CNI dir, or will something be angry if there is a non-JSON file in the CNI directory?

AkihiroSuda avatar Mar 06 '25 00:03 AkihiroSuda

Thanks, can we just create a lock file like .nerdctl.lock in the CNI dir, or will something be angry if there is a non-JSON file in the CNI directory?

Yep, it might be just a simple patch.

I need to look again into locking - specially the platform specific stuff.

apostasie avatar Mar 06 '25 01:03 apostasie

requested IP address 10.129.17.25 is not available in range set 10.129.17.24-10.129.17.63

Doesn't seem relevant to the OP

etcd-openebs

etcd and openebs do not need to be used to reproduce the issue?

any container can reproduce the issue....

nopeno avatar Mar 10 '25 00:03 nopeno

@nopeno

Your post is irrelevant to the OP. Please open a new issue.

AkihiroSuda avatar Mar 10 '25 00:03 AkihiroSuda

Just stumbling over this issue (K8s+Cilium+nerdctl v2.0.3). Can I bypass locking or override the locking path as a workaround?

stephan2012 avatar Mar 18 '25 12:03 stephan2012

Just stumbling over this issue (K8s+Cilium+nerdctl v2.0.3). Can I bypass locking or override the locking path as a workaround?

I can't think of any way to do that top of the head... Also note that at this point, the lock explanation is an (informed) hypothesis, not a firm root cause...

I'll look into it soon anyhow.

@fahedouch / @AkihiroSuda could we tentatively slate that for the next patch release / milestone to 2.x.x?

apostasie avatar Mar 20 '25 22:03 apostasie

@wzxmt thanks a lot for getting through with all the info, that was really helpful.

I have a patch in the linked pr #4165

If you feel like it, would you be able to build from it and try in your context?

Cc @stephan2012 as well if you feel like trying.

Thanks folks.

apostasie avatar Apr 26 '25 04:04 apostasie

Was this fixed/alleviated in #4165?

AkihiroSuda avatar May 01 '25 03:05 AkihiroSuda

@wzxmt非常感谢您提供的所有信息,这真的很有帮助。

我在链接的 pr #4165中有一个补丁

如果您愿意,您是否能够以此为基础并在您的环境中进行尝试?

抄送@stephan2012如果您想尝试的话也可以。

谢谢大家。

@wzxmt thanks a lot for getting through with all the info, that was really helpful.

I have a patch in the linked pr #4165

If you feel like it, would you be able to build from it and try in your context?

Cc @stephan2012 as well if you feel like trying.

Thanks folks.

Image

Updating nerdctl version to 2.0.5 did not solve the problem!

wzxmt avatar May 06 '25 05:05 wzxmt

@wzxmt

With 2.0.5, does nerdctl network ls work with cilium?

apostasie avatar May 06 '25 16:05 apostasie

yes

---Original--- From: @.> Date: Wed, May 7, 2025 00:55 AM To: @.>; Cc: @.@.>; Subject: Re: [containerd/nerdctl] [Cilium] Executing nerdctl run in k8environment is stuck (Issue #3783)

apostasie left a comment (containerd/nerdctl#3783)

@wzxmt

With 2.0.5, does nerdctl network ls work with cilium?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

wzxmt avatar May 06 '25 16:05 wzxmt

yes

Cool. So, the patch did address the first issue (concurrent locking with Cilium), which is good. We now have a second problem here.

It does not feel like I can continue just reading tea leaves though. I need to reproduce your environment.

@wzxmt can you share how you are installing and configuring? eg: I have a kind cluster - how do I setup cilium the same way as you?

Thanks in advance.

apostasie avatar May 06 '25 17:05 apostasie

yes

Cool. So, the patch did address the first issue (concurrent locking with Cilium), which is good. We now have a second problem here.

It does not feel like I can continue just reading tea leaves though. I need to reproduce your environment.

@wzxmt can you share how you are installing and configuring? eg: I have a kind cluster - how do I setup cilium the same way as you?

Thanks in advance.

install k8s :

kubeadm init --kubernetes-version=1.32.3 --apiserver-advertise-address=10.0.0.51 --control-plane-endpoint=10.0.0.51:6443 --service-cidr=10.96.0.0/16 --pod-network-cidr=172.16.0.0/16 --upload-certs

install cilium:

helm install cilium cilium/cilium --version 1.17.1
--namespace kube-cilium
--set operator.replicas=1
--set k8sServiceHost=10.0.0.51
--set k8sServicePort=6443
--set ipv4NativeRoutingCIDR=172.16.0.0/16
--set ipam.operator.clusterPoolIPv4PodCIDRList=172.16.0.0/16
--set routing-mode=native
--set kubeProxyReplacement=true
--set bpf.masquerade=true
--set envoy.enabled=true
--set bandwidthManager.enabled=true

wzxmt avatar May 07 '25 01:05 wzxmt

Thanks @wzxmt

I will set it up locally and figure this out. Unfortunately, this is not going to happen in time for the 2.1 release which is due today.

apostasie avatar May 07 '25 17:05 apostasie