Shim cannot connect to runtime daemon?
Hi, I'm playing with runwasi in kind by adapting the integration test Dockerfile. I see that the wasmtime shim works for running the docker.io/wasmedge/example-wasi:latest test image, but I cannot run the same workload when using a node image that configures daemon mode. Is there something else that I need to do to get daemon mode working?
Here's the error I see (both wasmedge and wasmtime fail in the same way):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15s default-scheduler Successfully assigned default/wasi-job-demo-wm4cj to kind-worker
Warning FailedCreatePodSandBox 14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmedged-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: "/runwasi.services.sandbox.v1.Manager/Connect is not supported", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))
: exit status 1: unknown
I configured the daemon as a part of the containerd systemd service and do see that it is running, and the unix socket is present as well:
root@kind-worker:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 20:22 ? 00:00:00 /sbin/init
root 79 1 0 20:22 ? 00:00:00 /lib/systemd/systemd-journald
message+ 90 1 0 20:22 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 113 1 0 20:22 ? 00:00:00 /usr/local/bin/containerd-wasmedged
root 117 1 1 20:22 ? 00:00:05 /usr/local/bin/containerd
root 201 1 1 20:23 ? 00:00:06 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd
root 254 1 0 20:23 ? 00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id a63b62567b06b0cd4d17f8c3ba7b870bb9f98d86df803216f26a9df57c88a327 -address /run/containerd/containerd.sock
root 255 1 0 20:23 ? 00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 10ae9a17d1bbe7a0098adb1e27fc296cfe0eaafacf26ba83fc71472aad92cef0 -address /run/containerd/containerd.sock
65535 295 255 0 20:23 ? 00:00:00 /pause
65535 297 254 0 20:23 ? 00:00:00 /pause
root 362 255 0 20:23 ? 00:00:00 /bin/kindnetd
root 387 254 0 20:23 ? 00:00:00 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=kind-worker
root@kind-worker:/# ls -l /var/run/io.containerd.wasmwasi.v1
total 0
srwxr-xr-x 1 root root 0 Jul 4 20:22 manager.sock
journalctl -u wasmedged.service shows nothing interesting.
containerd config:
root@kind-worker:/# more /etc/containerd/config.toml
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
restrict_oom_score_adj = false
sandbox_image = "registry.k8s.io/pause:3.7"
tolerate_missing_hugepages_controller = true
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
base_runtime_spec = "/etc/containerd/cri-base.json"
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler]
base_runtime_spec = "/etc/containerd/cri-base.json"
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
runtime_type = "io.containerd.wasmedged.v1"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
[proxy_plugins]
[proxy_plugins.fuse-overlayfs]
address = "/run/containerd-fuse-overlayfs.sock"
type = "snapshot"
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: wasm
handler: wasm
apply this config runtime for k8s .
using this runtime for your pod.
this is what I am using and it doesn't work for me with wasmtimed. It works ok for wasmtime.
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: wasm
labels:
app: wasi-job-demo
handler: wasm
---
apiVersion: batch/v1
kind: Job
metadata:
name: wasi-job-demo
spec:
template:
spec:
runtimeClassName: wasm
restartPolicy: Never
containers:
- name: wasi-job-demo
image: docker.io/wasmedge/example-wasi:latest
I've rebuild my image using the runwasi repo head and this is what I see for in the pod events now:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26s default-scheduler Successfully assigned default/wasi-job-demo-mrw25 to kind-worker
Warning FailedCreatePodSandBox 12s (x2 over 24s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmtimed-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: "/runwasi.services.sandbox.v1.Manager/Connect is not supported", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))
How did you install it? It looks like there are some issues with the installation.
wget https://github.com/containerd/runwasi/releases/download/containerd-shim-wasmedge/v0.3.0/containerd-shim-wasmedge-x86_64.tar.gz
tar -zxvf containerd-shim-wasmedge-x86_64.tar.gz -C /opt/containerd/bin/
cat <<EOF >> /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
runtime_type = "io.containerd.wasmedge.v1"
EOF
wasmedged is definitely a weak point where we didn't have extensive tests, so it could be broken.
Is there something else that I need to do to get daemon mode working?
Unfortunately at the moment I didn't have much ideas on this. Will take a look shortly.
How did you install it? It looks like there are some issues with the installation.
When I was playing with it back in July I set up the attached Dockerfile to build everything. I rebuilt images for wasmtime, wasmedge, wasmtimed, and wasmedged just now, and what I see when I submit a docker.io/wasmedge/example-wasi job to a kind cluster for each is:
- wasmtime - Works now, and worked back in July.
- wasmedge - This works with the latest runwasi. In July, this failed with
thread 'main' panicked at 'calledResult::unwrap()on anErrvalue: Custom { kind: Uncategorized, error: "failed to find a pre-opened file descriptor through which \"tmp.txt\" could be opened" }', src/main.rs:47:39. - wasmtimed - Fails now with the same error that I originally reported.
- wasmedged - Fails in the same manner as wasmtimed.
So there's progress in that wasmedge works where it wasn't for me before!
As before, I see that the wasm runtime daemon is running, and /var/run/io.containerd.wasmwasi.v1/manager.sock is present, but for some reason communication through that socked isn't working.
# systemctl list-units --type=service --state=running
UNIT LOAD ACTIVE SUB DESCRIPTION
containerd.service loaded active running containerd container runtime
dbus.service loaded active running D-Bus System Message Bus
kubelet.service loaded active running kubelet: The Kubernetes Node Agent
systemd-journald.service loaded active running Journal Service
wasmedged.service loaded active running wasmedged: runwasi daemon
When I look at the containerd journal in the worker node, I see:
Nov 14 15:58:45 kind-worker containerd[117]: time="2023-11-14T15:58:45.738464661Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:wasi-job-demo-fndd2,Uid:013d5a36-3359-4707-892f-5cc9810ca341,Namespace:default,Attempt:0,}"
Nov 14 15:58:47 kind-worker containerd[117]: time="2023-11-14T15:58:47.359606419Z" level=error msg="copy shim log" error="read /proc/self/fd/42: file already closed"
Nov 14 15:58:47 kind-worker containerd[117]: time="2023-11-14T15:58:47.596678497Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:wasi-job-demo-fndd2,Uid:013d5a36-3359-4707-892f-5cc9810ca341,Namespace:default,Attempt:0,} failed, error" error="failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmedged-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: \"/runwasi.services.sandbox.v1.Manager/Connect is not supported\", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))\n: exit status 1: unknown"
Nothing else in the journal looks different from that for a typical kind cluster.
This is something we don't currently test. It wouldn't surprise me if it is broken.
I can try debug it tomorrow.
Cool. LMK if there's anything you'd like me to check out on my system.
Just using this to build the image:
docker buildx build --platform linux/amd64,linux/arm64 --build-arg SHIM=${shim} --ssh default --push -t "${tag}" -f docker/Dockerfile .
Then, just using that image when creating a kind cluster.
@jprendes just checking in on this, any thoughts on why connecting to the shared-mode daemon doesn't work?
An update "shared-mode" still not working.
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmtimed-v1: Ttrpc(Nix(ENOENT))
This could be an killer-feature.
@jprendes any update on shared-mode?
There hasn't been any progress on this front. The first 2 steps would be:
- writing a test for it
- finding out where the current implementation is broken
@macko99 @erkules would you be interested in contributing?
See also: https://github.com/containerd/runwasi/issues/218#issuecomment-2220523767