helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

dfinit: restart-container-runtime restart loop

Open kakkoyun opened this issue 1 year ago • 13 comments
trafficstars

Bug report:

The restart-container-runtime init container is configured to restart the container runtime without any conditions. As a result, the pod remains in an unready state (NotReady) perpetually. This happens because the container runtime is continuously being restarted, preventing the pod from reaching a stable, ready state.

The restart should only happen once if the configuration is changed. So that the next loop could be marked as ready.

Expected behavior:

Daemonset should start normally.

How to reproduce it:

values.yaml with

client:
  enable: true
  config:
    verbose: true
  dfinit:
    enable: true
    config:
      verbose: true
      containerRuntime:
        containerd:
          registries:
            - hostNamespace: docker.io
              serverAddr: https://index.docker.io
              capabilities: ["pull", "resolve"]
            - hostNamespace: ghcr.io
              serverAddr: https://ghcr.io
              capabilities: ["pull", "resolve"]

Environment:

  • Dragonfly version: v2.1.49 (chart v1.1.67)
  • OS: Linux`
  • Kernel (e.g. uname -a): Linux jack-oneill 6.9.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 31 May 2024 15:14:45 +0000 x86_64 GNU/Linux
  • Others:

Logs:

kubectl describe pod:

Details

Name:             dragonfly-client-bgw5s
Namespace:        dragonfly
Priority:         0
Service Account:  default
Node:             e2e/192.168.39.248
Start Time:       Tue, 25 Jun 2024 23:27:38 +0200
Labels:           app=dragonfly
                  component=client
                  controller-revision-hash=7745678fdd
                  pod-template-generation=3
                  release=dragonfly
Annotations:      checksum/config: ff55a474fbf9a76574ac381a461ce0b797d557fdf76759063600387a8eaf0831
                  kubectl.kubernetes.io/restartedAt: 2024-06-25T23:27:37+02:00
Status:           Pending
IP:               192.168.39.248
IPs:
  IP:           192.168.39.248
Controlled By:  DaemonSet/dragonfly-client
Init Containers:
  update-containerd-remove-registry-mirrors:
    Container ID:  containerd://bc64537fca42caecc1a78c1e9b3ae2e307ef1c9e27ef8876c6c34609367f2d6b
    Image:         python:3.12-slim
    Image ID:      docker.io/library/python@sha256:2fba8e70a87bcc9f6edd20dda0a1d4adb32046d2acbca7361bc61da5a106a914
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -cxe
      apt-get update && apt-get install -y jq
      pip install yq
      if tomlq -e '.plugins."io.containerd.grpc.v1.cri".registry.mirrors' /etc/containerd/config.toml > /dev/null; then
        tomlq -i -t 'del(.plugins."io.containerd.grpc.v1.cri".registry.mirrors)' /etc/containerd/config.toml
        nsenter -t 1 -m -- systemctl try-reload-or-restart containerd.service
        echo "containerd config updated"
      else
        echo "Entry does not exist, no changes made"
      fi
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Jun 2024 23:27:38 +0200
      Finished:     Tue, 25 Jun 2024 23:27:42 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/containerd from containerd-config-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
  wait-for-scheduler:
    Container ID:  containerd://e79694fd393fd32ec9d161dbab25e1ff8cc023b5c92d227e096c849016f4fcd5
    Image:         docker.io/busybox:latest
    Image ID:      docker.io/library/busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until nslookup dragonfly-scheduler.dragonfly.svc.cluster.local && nc -vz dragonfly-scheduler.dragonfly.svc.cluster.local 8002; do echo waiting for scheduler; sleep 2; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Jun 2024 23:27:43 +0200
      Finished:     Tue, 25 Jun 2024 23:27:43 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
  dfinit:
    Container ID:  containerd://935e0fe5c37bb824fc553fb717cbf40f80bf588b53fe1e01d1645b21ab1954c4
    Image:         docker.io/dragonflyoss/dfinit:v0.1.82
    Image ID:      docker.io/dragonflyoss/dfinit@sha256:4c793f262a9e1db6f55cedc2a7f322a1a01165fc50480b652637f5f7639b8192
    Port:          <none>
    Host Port:     <none>
    Args:
      --log-level=info
      --verbose
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Jun 2024 23:27:44 +0200
      Finished:     Tue, 25 Jun 2024 23:27:44 +0200
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/containerd from containerd-config-dir (rw)
      /etc/dragonfly from dfinit-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
  restart-container-runtime:
    Container ID:  containerd://bd622dc89080b0d6d65e09078805f399a94bc7603123feae452cd463991441c9
    Image:         docker.io/busybox:latest
    Image ID:      docker.io/library/busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -cx
      nsenter -t 1 -m -- systemctl restart containerd.service
      echo "restart container"
    State:          Waiting
      Reason:       RunContainerError
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
Containers:
  client:
    Container ID:  
    Image:         docker.io/dragonflyoss/client:v0.1.82
    Image ID:      
    Ports:         4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
    Host Ports:    4000/TCP, 4003/TCP, 4002/TCP, 4004/TCP
    Args:
      --log-level=info
      --verbose
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:        0
      memory:     0
    Liveness:     exec [/bin/grpc_health_probe -addr=:4000] delay=15s timeout=1s period=10s #success=1 #failure=3
    Readiness:    exec [/bin/grpc_health_probe -addr=:4000] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/dragonfly from config (rw)
      /var/lib/dragonfly/ from storage (rw)
      /var/log/dragonfly/dfdaemon/ from logs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jj67n (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dragonfly-client
    Optional:  false
  dfinit-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dragonfly-dfinit
    Optional:  false
  containerd-config-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/containerd
    HostPathType:  DirectoryOrCreate
  storage:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/dragonfly/
    HostPathType:  DirectoryOrCreate
  logs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-jj67n:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              fal/group=default
Tolerations:                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  41m   default-scheduler  Successfully assigned dragonfly/dragonfly-client-bgw5s to e2e
  Normal   Pulled     41m   kubelet            Container image "python:3.12-slim" already present on machine
  Normal   Created    41m   kubelet            Created container update-containerd-remove-registry-mirrors
  Normal   Started    41m   kubelet            Started container update-containerd-remove-registry-mirrors
  Normal   Pulled     41m   kubelet            Container image "docker.io/busybox:latest" already present on machine
  Normal   Created    41m   kubelet            Created container wait-for-scheduler
  Normal   Started    41m   kubelet            Started container wait-for-scheduler
  Normal   Pulled     41m   kubelet            Container image "docker.io/dragonflyoss/dfinit:v0.1.82" already present on machine
  Normal   Created    41m   kubelet            Created container dfinit
  Normal   Started    41m   kubelet            Started container dfinit
  Normal   Pulled     41m   kubelet            Container image "docker.io/busybox:latest" already present on machine
  Normal   Created    41m   kubelet            Created container restart-container-runtime
  Warning  Failed     41m   kubelet            Error: error reading from server: EOF

kakkoyun avatar Jun 25 '24 22:06 kakkoyun