helm-charts
helm-charts copied to clipboard
dfinit: restart-container-runtime restart loop
trafficstars
Bug report:
The restart-container-runtime init container is configured to restart the container runtime without any conditions. As a result, the pod remains in an unready state (NotReady) perpetually. This happens because the container runtime is continuously being restarted, preventing the pod from reaching a stable, ready state.
The restart should only happen once if the configuration is changed. So that the next loop could be marked as ready.
Expected behavior:
Daemonset should start normally.
How to reproduce it:
values.yaml with
client:
enable: true
config:
verbose: true
dfinit:
enable: true
config:
verbose: true
containerRuntime:
containerd:
registries:
- hostNamespace: docker.io
serverAddr: https://index.docker.io
capabilities: ["pull", "resolve"]
- hostNamespace: ghcr.io
serverAddr: https://ghcr.io
capabilities: ["pull", "resolve"]
Environment:
- Dragonfly version: v2.1.49 (chart v1.1.67)
- OS: Linux`
- Kernel (e.g.
uname -a):Linux jack-oneill 6.9.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 31 May 2024 15:14:45 +0000 x86_64 GNU/Linux - Others:
Logs:
kubectl describe pod:
Details
Name: dragonfly-client-bgw5s
Namespace: dragonfly
Priority: 0
Service Account: default
Node: e2e/192.168.39.248
Start Time: Tue, 25 Jun 2024 23:27:38 +0200
Labels: app=dragonfly
component=client
controller-revision-hash=7745678fdd
pod-template-generation=3
release=dragonfly
Annotations: checksum/config: ff55a474fbf9a76574ac381a461ce0b797d557fdf76759063600387a8eaf0831
kubectl.kubernetes.io/restartedAt: 2024-06-25T23:27:37+02:00
Status: Pending
IP: 192.168.39.248
IPs:
IP: 192.168.39.248
Controlled By: DaemonSet/dragonfly-client
Init Containers:
update-containerd-remove-registry-mirrors:
Container ID: containerd://bc64537fca42caecc1a78c1e9b3ae2e307ef1c9e27ef8876c6c34609367f2d6b
Image: python:3.12-slim
Image ID: docker.io/library/python@sha256:2fba8e70a87bcc9f6edd20dda0a1d4adb32046d2acbca7361bc61da5a106a914
Port: <none>
Host Port: <none>
Command:
/bin/sh
-cxe
apt-get update && apt-get install -y jq
pip install yq
if tomlq -e '.plugins."io.containerd.grpc.v1.cri".registry.mirrors' /etc/containerd/config.toml > /dev/null; then
tomlq -i -t 'del(.plugins."io.containerd.grpc.v1.cri".registry.mirrors)' /etc/containerd/config.toml
nsenter -t 1 -m -- systemctl try-reload-or-restart containerd.service
echo "containerd config updated"
else
echo "Entry does not exist, no changes made"
fi
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 25 Jun 2024 23:27:38 +0200
Finished: Tue, 25 Jun 2024 23:27:42 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/containerd from containerd-config-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
wait-for-scheduler:
Container ID: containerd://e79694fd393fd32ec9d161dbab25e1ff8cc023b5c92d227e096c849016f4fcd5
Image: docker.io/busybox:latest
Image ID: docker.io/library/busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
Port: <none>
Host Port: <none>
Command:
sh
-c
until nslookup dragonfly-scheduler.dragonfly.svc.cluster.local && nc -vz dragonfly-scheduler.dragonfly.svc.cluster.local 8002; do echo waiting for scheduler; sleep 2; done;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 25 Jun 2024 23:27:43 +0200
Finished: Tue, 25 Jun 2024 23:27:43 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
dfinit:
Container ID: containerd://935e0fe5c37bb824fc553fb717cbf40f80bf588b53fe1e01d1645b21ab1954c4
Image: docker.io/dragonflyoss/dfinit:v0.1.82
Image ID: docker.io/dragonflyoss/dfinit@sha256:4c793f262a9e1db6f55cedc2a7f322a1a01165fc50480b652637f5f7639b8192
Port: <none>
Host Port: <none>
Args:
--log-level=info
--verbose
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 25 Jun 2024 23:27:44 +0200
Finished: Tue, 25 Jun 2024 23:27:44 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/containerd from containerd-config-dir (rw)
/etc/dragonfly from dfinit-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
restart-container-runtime:
Container ID: containerd://bd622dc89080b0d6d65e09078805f399a94bc7603123feae452cd463991441c9
Image: docker.io/busybox:latest
Image ID: docker.io/library/busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
Port: <none>
Host Port: <none>
Command:
/bin/sh
-cx
nsenter -t 1 -m -- systemctl restart containerd.service
echo "restart container"
State: Waiting
Reason: RunContainerError
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
Containers:
client:
Container ID:
Image: docker.io/dragonflyoss/client:v0.1.82
Image ID:
Ports: 4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
Host Ports: 4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
Args:
--log-level=info
--verbose
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 0
memory: 0
Liveness: exec [/bin/grpc_health_probe -addr=:4000] delay=15s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/grpc_health_probe -addr=:4000] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/dragonfly from config (rw)
/var/lib/dragonfly/ from storage (rw)
/var/log/dragonfly/dfdaemon/ from logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dragonfly-client
Optional: false
dfinit-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dragonfly-dfinit
Optional: false
containerd-config-dir:
Type: HostPath (bare host directory volume)
Path: /etc/containerd
HostPathType: DirectoryOrCreate
storage:
Type: HostPath (bare host directory volume)
Path: /var/lib/dragonfly/
HostPathType: DirectoryOrCreate
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jj67n:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: fal/group=default
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41m default-scheduler Successfully assigned dragonfly/dragonfly-client-bgw5s to e2e
Normal Pulled 41m kubelet Container image "python:3.12-slim" already present on machine
Normal Created 41m kubelet Created container update-containerd-remove-registry-mirrors
Normal Started 41m kubelet Started container update-containerd-remove-registry-mirrors
Normal Pulled 41m kubelet Container image "docker.io/busybox:latest" already present on machine
Normal Created 41m kubelet Created container wait-for-scheduler
Normal Started 41m kubelet Started container wait-for-scheduler
Normal Pulled 41m kubelet Container image "docker.io/dragonflyoss/dfinit:v0.1.82" already present on machine
Normal Created 41m kubelet Created container dfinit
Normal Started 41m kubelet Started container dfinit
Normal Pulled 41m kubelet Container image "docker.io/busybox:latest" already present on machine
Normal Created 41m kubelet Created container restart-container-runtime
Warning Failed 41m kubelet Error: error reading from server: EOF