security-profiles-operator icon indicating copy to clipboard operation
security-profiles-operator copied to clipboard

bpf-recorder is not valid for spod pods

Open shaojini opened this issue 1 year ago • 11 comments

What happened:

After installing SPO, when the verification of the bpf-recorder for its up and running is done, it shows an error: container bpf-recorder is not valid for the spod pod. When I try to enable it by patching the spod configuration, all pods of spod have been crashed without able to successfully restart.

What you expected to happen:

The bpf-recorder is up and running.

How to reproduce it (as minimally and precisely as possible):

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
security-profiles-operator-8588b78997-4p2zv           1/1     Running   0          59s
security-profiles-operator-8588b78997-nlvxf           1/1     Running   0          59s
security-profiles-operator-8588b78997-wctnn           1/1     Running   0          59s
security-profiles-operator-webhook-8476cd6f8c-d7qqb   1/1     Running   0          56s
security-profiles-operator-webhook-8476cd6f8c-f2zqs   1/1     Running   0          56s
security-profiles-operator-webhook-8476cd6f8c-vq6z2   1/1     Running   0          56s
spod-2vkz6                                            2/2     Running   0          56s
spod-g9d2m                                            2/2     Running   0          56s
spod-kd4b6                                            2/2     Running   0          56s

root@k8s-master:~# kubectl -n security-profiles-operator logs --selector name=spod -c bpf-recorder
error: container bpf-recorder is not valid for pod spod-2vkz6

root@k8s-master:~# kubectl -n security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}' patched

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS             RESTARTS        AGE
security-profiles-operator-8588b78997-4p2zv           1/1     Running            0               22m
security-profiles-operator-8588b78997-nlvxf           1/1     Running            0               22m
security-profiles-operator-8588b78997-wctnn           1/1     Running            0               22m
security-profiles-operator-webhook-8476cd6f8c-d7qqb   1/1     Running            0               21m
security-profiles-operator-webhook-8476cd6f8c-f2zqs   1/1     Running            0               21m
security-profiles-operator-webhook-8476cd6f8c-vq6z2   1/1     Running            0               21m
spod-28qd6                                            2/3     CrashLoopBackOff   7 (5m3s ago)    16m
spod-2msmj                                            2/3     Error              8 (5m6s ago)    16m
spod-rp2vz                                            2/3     CrashLoopBackOff   7 (4m50s ago)   16m

Anything else we need to know?:


  • Cloud provider or hardware configuration: VM nodes
  • OS (e.g: cat /etc/os-release): NAME="Ubuntu", VERSION="20.04.6 LTS
  • Kernel (e.g. uname -a): 5.4.0-156-generic
  • Others:

shaojini avatar Aug 17 '23 12:08 shaojini

Hey @shaojini, thank you for the report. CAn you extract the crash logs of the spod instances, like spod-28qd6?

saschagrunert avatar Aug 17 '23 12:08 saschagrunert

root@k8s-master:~# kubectl -n security-profiles-operator logs spod-6579n
Defaulted container "security-profiles-operator" out of: security-profiles-operator, bpf-recorder, metrics, non-root-enabler (init)
I0817 12:45:07.665328 1346435 main.go:260]  "msg"="Set logging verbosity to 0"
I0817 12:45:07.666835 1346435 main.go:266]  "msg"="Profiling support enabled: false"
I0817 12:45:07.667151 1346435 main.go:286] setup "msg"="starting component: spod" "buildDate"="1980-01-01T00:00:00Z" "buildTags"="netgo,osusergo,seccomp,apparmor" "cgoldFlags"="unknown" "compiler"="gc" "dependencies"=" v0.2.3 , v0.5.0 , v1.0.0 , v0.2.0 , v68.0.0+incompatible , v0.11.29 , v0.9.22 , v0.5.12 , v0.4.6 , v0.3.0 , v0.2.1 , v0.6.0 , v1.2.8 , v0.0.0-20230518184743-7afd39499903 , v0.1.0 , v1.1.1 , v0.0.4 , v1.0.1 , v1.0.10 , v0.1.18 , v0.0.0-20190504072949-9472017b5c68 , v1.1.1 , v0.0.11 , v1.1.18 , v1.4.4 , v1.1.2 , v1.2.3 , v0.4.9-libbpf-1.2.0 , v0.0.0-20230301143203-a9d515a09cc2 , v1.18.1 , v1.18.27 , v1.13.26 , v1.13.4 , v1.1.34 , v2.4.28 , v1.3.35 , v1.15.0 , v1.12.0 , v1.9.28 , v1.12.12 , v1.14.12 , v1.19.2 , v1.13.5 , v0.0.0-20220228164355-396b2034c795 , v1.0.1 , v3.5.1+incompatible , v4.0.0 , v3.49.0 , v1.12.3 , v2.2.0 , v0.0.0-20220119192733-fe33c00cee21 , v2.5.6 , v1.3.3 , v2.0.2 , v0.0.0-20210622060536-734e95fb86be , v0.14.3 , v0.55.3 , v3.6.0 , v2.0.2 , v0.0.0-20230514072755-504adb8a8af1 , v1.1.1 , v0.0.0-20221212123742-001c36b64ec3 , v0.0.0-20221019182153-ef3b63b79b31 , v1.1.1 , v24.0.0+incompatible , v2.8.2+incompatible , v24.0.2+incompatible , v0.7.0 , v3.9.0 , v1.10.0 , v5.6.0 , v1.6.0 , v1.4.2 , v1.0.0 , v4.1.2+incompatible , v3.0.0 , v1.2.4 , v1.2.2 , v0.21.4 , v0.20.3 , v0.19.6 , v0.20.2 , v0.21.2 , v0.26.0 , v0.20.9 , v0.21.7 , v0.22.4 , v0.22.1 , v0.14.1 , v0.18.1 , v10.14.0 , v0.2.3 , v1.3.2 , v4.5.0 , v1.1.0 , v0.0.0-20210331224755-41bb18bfe9da , v1.5.3 , v0.0.4 , v1.1.6 , v0.6.8 , v0.5.9 , v0.16.1 , v50.2.0 , v1.1.0 , v1.2.0 , v0.1.4 , v1.3.0 , v0.2.4 , v0.5.2 , v0.7.2 , v1.0.0 , v0.3.16 , v0.9.0 , v0.0.0-20211028175153-1c139d1cc84b , v3.0.1 , v0.4.0 , v1.0.0 , v1.1.12 , v1.16.6 , v1.2.4 , v0.0.0-20230213213521-fdfea0d469b6 , v1.8.7 , v0.7.7 , v1.0.4 , v1.1.0 , v1.0.1 , v1.5.0 , v0.0.0-20180306012644-bacd9c7ef1dd , v1.0.2 , v0.3.0 , v0.0.0-20150818121801-cbe035fff7de , v0.0.0-20191010083416-a7dc8b61c822 , v0.0.0-20180817012639-2ea982251481 , v1.4.8 , v1.3.1 , v0.52.0 , v1.0.0 , v1.1.0-rc4 , v1.1.0 , v0.0.0-20221205111557-f2fbb1d1cd5e , v1.2.0 , v1.2.1 , v2.0.8 , v0.1.2 , v0.9.1 , v0.67.1 , v1.16.0 , v0.4.0 , v0.42.0 , v0.10.1 , v0.0.0-20220428173112-74888fd59c2b , v0.0.0-20201227073835-cf1acfcdf475 , v2.1.0 , v7.2.1+incompatible , v0.10.0 , v0.6.0 , v1.0.4 , v1.3.0 , v2.1.1 , v1.3.1 , v1.2.2-0.20230601122533-4c81ff246d12 , v1.7.1 , v1.1.1 , v1.9.3 , v0.0.0-20200116055534-eef842397966 , v1.9.5 , v1.5.1 , v1.7.0 , v1.1.0 , v1.0.5 , v1.16.0 , v2.1.6 , v1.4.2 , v1.0.1-0.20220721030215-126854af5e6d , v2.3.1 , v0.5.2 , v0.0.0-20171023193734-afe73141d399 , v1.3.2 , v0.0.2 , v2.25.7 , v0.11.3 , v0.86.0 , v0.0.0-20190905194746-02993c407bfb , v0.0.0-20180127040603-bd5ef7bd5415 , v0.0.0-20201216005158-039620a65673 , v0.1.0 , v1.3.0 , v1.11.3 , v0.24.0 , v1.16.0 , v1.16.0 , v1.16.0 , v0.32.1 , v1.10.0 , v1.11.0 , v1.24.0 , v0.12.0 , v0.0.0-20230522175609-2e198f4a06a1 , v0.12.0 , v0.14.0 , v0.9.0 , v0.3.0 , v0.11.0 , v0.11.0 , v0.12.0 , v0.3.0 , v2.3.0 , v0.128.0 , v1.6.7 , v0.0.0-20230530153820-e85fd2cbaebc , v1.57.0 , v1.31.0 , v2.6.1 , v0.9.1 , v1.67.0 , v2.6.0 , v1.0.0-20141024135613-dd632973f1e7 , v2.4.0 , v3.0.1 , v0.28.0 , v0.27.2 , v0.28.0 , v0.28.0 , v0.27.2 , v2.100.1 , v0.0.0-20230717233707-2695361300d9 , v0.0.0-20230505201702-9f6742963106 , v2.2.1 , v0.15.1 , v0.7.0 , v0.0.0-20221116044647-bc3834ca7abd , v0.7.4 , v4.2.3 , v1.3.0 " "gitCommit"="6d51dc8d1bdae339b47facd5c9b8a0e884c30ff8" "gitCommitDate"="2023-08-17T07:36:21Z" "gitTreeState"="clean" "goVersion"="go1.20.4" "ldFlags"="unknown" "libbpf"="v1.2" "libseccomp"="2.5.4" "platform"="linux/amd64" "version"="0.8.1-dev"
I0817 12:45:07.667702 1346435 main.go:365] setup "msg"="watching all namespaces"
I0817 12:45:07.668061 1346435 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I0817 12:45:07.668442 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_error_total"
I0817 12:45:07.668482 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_audit_total"
I0817 12:45:07.668497 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_total"
I0817 12:45:07.668504 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_audit_total"
I0817 12:45:07.668512 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_total"
I0817 12:45:07.668524 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_bpf_total"
I0817 12:45:07.668531 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_error_total"
I0817 12:45:07.668539 1346435 metrics.go:217] metrics "msg"="Registering metric: apparmor_profile_error_total"
I0817 12:45:07.668546 1346435 metrics.go:217] metrics "msg"="Registering metric: seccomp_profile_audit_total"
I0817 12:45:07.668553 1346435 metrics.go:217] metrics "msg"="Registering metric: selinux_profile_total"
I0817 12:45:07.669531 1346435 grpc.go:60] metrics "msg"="Starting GRPC server API"
I0817 12:45:07.707643 1346435 profilerecorder.go:144] recorder-spod "msg"="Setting up profile recorder" "Node"=""
I0817 12:45:07.707706 1346435 main.go:486] setup "msg"="starting daemon"
I0817 12:45:07.707891 1346435 server.go:50]  "msg"="starting server" "addr"={"IP":"::","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I0817 12:45:07.707968 1346435 internal.go:360]  "msg"="Starting server" "addr"={"IP":"::","Port":8085,"Zone":""} "kind"="health probe"
I0817 12:45:07.708018 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profile" "controllerGroup"="" "controllerKind"="SeccompProfile" "source"="kind source: *v1beta1.SeccompProfile"
I0817 12:45:07.708044 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profile" "controllerGroup"="" "controllerKind"="SeccompProfile" "source"="kind source: *v1alpha1.SecurityProfilesOperatorDaemon"
I0817 12:45:07.708060 1346435 controller.go:185]  "msg"="Starting Controller" "controller"="profile" "controllerGroup"="" "controllerKind"="SeccompProfile"
I0817 12:45:07.708061 1346435 controller.go:177]  "msg"="Starting EventSource" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod" "source"="kind source: *v1.Pod"
I0817 12:45:07.708076 1346435 controller.go:185]  "msg"="Starting Controller" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod"
I0817 12:45:07.899659 1346435 controller.go:219]  "msg"="Starting workers" "controller"="profile" "controllerGroup"="" "controllerKind"="SeccompProfile" "worker count"=1
I0817 12:45:07.941865 1346435 controller.go:219]  "msg"="Starting workers" "controller"="profilerecorder" "controllerGroup"="" "controllerKind"="Pod" "worker count"=1

shaojini avatar Aug 17 '23 12:08 shaojini

Hi, @saschagrunert .

Any comment for this issue? Thanks.

shaojini avatar Aug 22 '23 07:08 shaojini

@shaojini we need to find out why the pod has been crashed, while the logs on do not indicate any crash at all. Do you have the logs of the crashing pod somehow available?

saschagrunert avatar Aug 22 '23 07:08 saschagrunert

Hi, @saschagrunert .

I have tried uninstalled and re-installed a few time, but the problem is the same. The logs given previously is taken from one of those tries (Before reporting the issue, I have tried at least twice to confirm it). The restart of pods may be normal because the patching on those spod pods has been done (the name of spods has been changed). However, the restart of "crash" pods can not been done successfully (describe of spod can find some information?):

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
security-profiles-operator-8588b78997-8cm8z           1/1     Running   0          17h
security-profiles-operator-8588b78997-9rg9j           1/1     Running   0          17h
security-profiles-operator-8588b78997-csrhk           1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-g9m5v   1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-nh57n   1/1     Running   0          17h
security-profiles-operator-webhook-8476cd6f8c-qzpk5   1/1     Running   0          17h
spod-lbcnc                                            3/3     Running   0          17h
spod-t5vf6                                            3/3     Running   0          17h
spod-wg9w7                                            3/3     Running   0          17h

root@k8s-master:~# kubectl -n security-profiles-operator logs --selector name=spod -c bpf-recorder
error: container bpf-recorder is not valid for pod spod-lbcnc

root@k8s-master:~# kubectl -n security-profiles-operator patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}' patched

root@k8s-master:~# kubectl get pods -n security-profiles-operator
NAME                                                  READY   STATUS             RESTARTS      AGE
security-profiles-operator-8588b78997-8cm8z           1/1     Running            0             17h
security-profiles-operator-8588b78997-9rg9j           1/1     Running            0             17h
security-profiles-operator-8588b78997-csrhk           1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-g9m5v   1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-nh57n   1/1     Running            0             17h
security-profiles-operator-webhook-8476cd6f8c-qzpk5   1/1     Running            0             17h
spod-ppm5q                                            3/4     CrashLoopBackOff   5 (54s ago)   4m12s
spod-xn6wt                                            3/4     CrashLoopBackOff   5 (62s ago)   4m11s
spod-xwkft                                            3/4     CrashLoopBackOff   5 (49s ago)   4m11s
root@k8s-master:~# kubectl describe -n security-profiles-operator pod spod-ppm5q
Name:                 spod-ppm5q
Namespace:            security-profiles-operator
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      spod
Node:                 k8s-worker3/
Start Time:           Tue, 22 Aug 2023 11:36:36 +0300
Labels:               app=security-profiles-operator
Annotations: privileged
Status:               Running
SeccompProfile:       RuntimeDefault
Controlled By:  DaemonSet/spod
Init Containers:
    Container ID:  cri-o://5f1ec4f1c35b36ee0e940fb6d73e553a15bbc5ea1483f18c246aecc549236ebc
    Image ID:
    Port:          <none>
    Host Port:     <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 22 Aug 2023 11:36:44 +0300
      Finished:     Tue, 22 Aug 2023 11:36:44 +0300
    Ready:          True
    Restart Count:  0
      ephemeral-storage:  50Mi
      memory:             64Mi
      cpu:                100m
      ephemeral-storage:  10Mi
      memory:             32Mi
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      /host from host-root-volume (rw)
      /opt/spo-profiles from operator-profiles-volume (ro)
      /var/lib from host-varlib-volume (rw)
      /var/run/secrets/ from kube-api-access-g8x8b (ro)
      /var/run/secrets/metrics from metrics-cert-volume (rw)
    Container ID:        cri-o://b8f632a1fcfc144f57b6042ba44fa77ecbb5399163d77fe6a1ffb8529bf7fab8
    Image ID:  
    Port:                8085/TCP
    Host Port:           0/TCP
    SeccompProfile:      Localhost
      LocalhostProfile:  security-profiles-operator.json
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:47 +0300
    Ready:          True
    Restart Count:  0
      ephemeral-storage:  200Mi
      memory:             128Mi
      cpu:                100m
      ephemeral-storage:  50Mi
      memory:             64Mi
    Liveness:             http-get http://:liveness-port/healthz delay=0s timeout=1s period=10s #success=1 #failure=1
    Startup:              http-get http://:liveness-port/healthz delay=0s timeout=1s period=3s #success=1 #failure=10
      NODE_NAME:             (v1:spec.nodeName)
      OPERATOR_NAMESPACE:   security-profiles-operator (v1:metadata.namespace)
      SPOD_NAME:            spod
      KUBELET_DIR:          /var/lib/kubelet
      HOME:                 /home
      SPO_VERBOSITY:        0
      /etc/selinux.d from selinux-drop-dir (rw)
      /home from home-volume (rw)
      /tmp from tmp-volume (rw)
      /tmp/security-profiles-operator-recordings from profile-recording-output-volume (rw)
      /var/lib/kubelet/seccomp/operator from host-operator-volume (rw)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/ from kube-api-access-g8x8b (ro)
      /var/run/selinuxd from selinuxd-private-volume (rw)
    Container ID:  cri-o://33b914d4e40dc990d51b7b19e6a7470b3446b90a6786f1a39764d2ec7ca2630e
    Image ID:
    Port:          <none>
    Host Port:     <none>
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:48 +0300
    Ready:          True
    Restart Count:  0
      ephemeral-storage:  128Mi
      memory:             256Mi
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             64Mi
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      /var/log from host-syslog-volume (ro)
      /var/log/audit from host-auditlog-volume (ro)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/ from kube-api-access-g8x8b (ro)
    Container ID:  cri-o://1f26b655d092359cbd22fc47a72a6f065b0faee1cd92f44b542a5bb129241f62
    Image ID:
    Port:          <none>
    Host Port:     <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 22 Aug 2023 11:42:37 +0300
      Finished:     Tue, 22 Aug 2023 11:42:37 +0300
    Ready:          False
    Restart Count:  6
      ephemeral-storage:  20Mi
      memory:             128Mi
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             64Mi
      NODE_NAME:       (v1:spec.nodeName)
      KUBELET_DIR:    /var/lib/kubelet
      /etc/os-release from host-etc-osrelease-volume (rw)
      /sys/kernel/debug from sys-kernel-debug-volume (ro)
      /tmp from tmp-volume (rw)
      /var/run/grpc from grpc-server-volume (rw)
      /var/run/secrets/ from kube-api-access-g8x8b (ro)
    Container ID:  cri-o://795ced755723f826f0fbb7cb80584fc4e5e0a777340a418102e7c55c9b6f3519
    Image ID:
    Port:          9443/TCP
    Host Port:     0/TCP
    State:          Running
      Started:      Tue, 22 Aug 2023 11:36:50 +0300
    Ready:          True
    Restart Count:  0
      ephemeral-storage:  20Mi
      memory:             128Mi
      cpu:                50m
      ephemeral-storage:  10Mi
      memory:             32Mi
    Environment:          <none>
      /var/run/secrets/ from kube-api-access-g8x8b (ro)
      /var/run/secrets/metrics from metrics-cert-volume (ro)
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib
    HostPathType:  Directory
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/security-profiles-operator
    HostPathType:  DirectoryOrCreate
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      security-profiles-operator-profile
    Optional:  false
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/selinux
    HostPathType:  Directory
    Type:          HostPath (bare host directory volume)
    Path:          /etc/selinux
    HostPathType:  Directory
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/selinux
    HostPathType:  Directory
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/security-profiles-operator-recordings
    HostPathType:  DirectoryOrCreate
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/audit
    HostPathType:  DirectoryOrCreate
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  DirectoryOrCreate
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-cert
    Optional:    false
    Type:          HostPath (bare host directory volume)
    Path:          /sys/kernel/debug
    HostPathType:  Directory
    Type:          HostPath (bare host directory volume)
    Path:          /etc/os-release
    HostPathType:  File
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  Directory
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Tolerations:        op=Exists
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m25s                  default-scheduler  Successfully assigned security-profiles-operator/spod-ppm5q to k8s-worker3
  Normal   Pulling    6m25s                  kubelet            Pulling image ""
  Normal   Pulled     6m18s                  kubelet            Successfully pulled image "" in 6.98679362s (6.986814507s including waiting)
  Normal   Created    6m18s                  kubelet            Created container non-root-enabler
  Normal   Started    6m18s                  kubelet            Started container non-root-enabler
  Normal   Pulling    6m17s                  kubelet            Pulling image ""
  Normal   Pulled     6m16s                  kubelet            Successfully pulled image "" in 987.480618ms (987.493453ms including waiting)
  Normal   Created    6m15s                  kubelet            Created container security-profiles-operator
  Normal   Started    6m15s                  kubelet            Started container security-profiles-operator
  Normal   Pulling    6m15s                  kubelet            Pulling image ""
  Normal   Pulled     6m14s                  kubelet            Successfully pulled image "" in 1.266629881s (1.266641343s including waiting)
  Normal   Created    6m14s                  kubelet            Created container log-enricher
  Normal   Started    6m14s                  kubelet            Started container log-enricher
  Normal   Pulled     6m13s                  kubelet            Successfully pulled image "" in 827.998943ms (828.022247ms including waiting)
  Normal   Pulled     6m13s                  kubelet            Container image "" already present on machine
  Normal   Pulling    6m12s (x2 over 6m14s)  kubelet            Pulling image ""
  Normal   Created    6m12s                  kubelet            Created container metrics
  Normal   Started    6m12s                  kubelet            Started container metrics
  Normal   Created    6m11s (x2 over 6m13s)  kubelet            Created container bpf-recorder
  Normal   Pulled     6m11s                  kubelet            Successfully pulled image "" in 1.006150487s (1.006230461s including waiting)
  Normal   Started    6m10s (x2 over 6m13s)  kubelet            Started container bpf-recorder
  Warning  BackOff    77s (x25 over 6m10s)   kubelet            Back-off restarting failed container bpf-recorder in pod spod-ppm5q_security-profiles-operator(8f1d734d-2b89-47e5-b74a-fd0bd996473f)

shaojini avatar Aug 22 '23 09:08 shaojini

@shaojini you can see that the bpf-recorder has the container id 1f26b655d092359cbd22fc47a72a6f065b0faee1cd92f44b542a5bb129241f62 from kubectl describe. May I ask you to access the node and run something like sudo crictl logs <ID> to get the logs of the crashing container?

saschagrunert avatar Aug 22 '23 11:08 saschagrunert

Hi, @saschagrunert .

I have re-operated the issue again for comparing the "describe pod spod-xxxx" before and after the patching. The difference is to enable "recording" in the "daemon" and create one extra container of "bpf-recorder" in the spod. In addition, the Container IDs (cri-o) of security-profiles-operator and metric have been changed:

From the logs of the node (re-installing has been done), the error is the "container ID does not exist":

root@k8s-worker3:~# sudo crictl logs 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e

E0822 14:57:23.045998 1389099 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = could not find container \"723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e\": container with ID starting with 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e not found: ID does not exist" containerID="723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e"
FATA[0000] rpc error: code = NotFound desc = could not find container "723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e": container with ID starting with 723ec7f3c3c798953fa217bed812ac179352bbe11a27597d6568458ad41efe9e not found: ID does not exist

shaojini avatar Aug 22 '23 12:08 shaojini

Hi @saschagrunert .

That ID in the "describe" is not the actual container id. I got the ID in this way:

root@k8s-worker3:~# crictl ps -a

CONTAINER           IMAGE                                                                                                                               CREATED             STATE               NAME                         ATTEMPT             POD ID              POD
a8fefdb4a1664   2 minutes ago       Exited              bpf-recorder                 53                  70487bf0e53cc       spod-2v9z7

Then to get the error from the logs of the container ID:
root@k8s-worker3:~# sudo crictl logs a8fefdb4a1664
E0822 16:01:38.078963 1543597 main.go:235] setup "msg"="running security-profiles-operator" "error"="connect to metrics server: connect to local GRPC server: wait on retry: timed out waiting for the condition"

Because it tries to restart all time, the container id also changes all the time. Therefore, the logs of the ID can not be found after a few minutes.

shaojini avatar Aug 22 '23 16:08 shaojini

@shaojini the previous logs of the container should be still available somehow, see kubectl logs --previous. At least for a few minutes as you'd mentioned.

saschagrunert avatar Aug 23 '23 07:08 saschagrunert

Hi, @saschagrunert.

The bef-recorder container log has shown the error: "error"="connect to metrics server: connect to local GRPC server: wait on retry: timed out waiting for the condition".

Is it the root cause for the unsuccessful staring-up of the bef-recorder container according to the logs? Do you have the same issue when you config the bef-recorder for recording?

shaojini avatar Aug 23 '23 08:08 shaojini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 26 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 25 '24 21:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 26 '24 22:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 26 '24 22:03 k8s-ci-robot