trivy-operator ErrImageNeverPull with trivy.command = filesystem or rootfs

What steps did you take and what happened:

Hi,

We saw there is issue when we configure trivy.command = filesystem or trivy.command = rootfs then sometimes scan job appear status ErrImageNeverPull.

Here is log of scan job

kubectl logs scan-vulnerabilityreport-755cd9546-k7wz6 -n trivy-system
Defaulted container "k8s-cluster" out of: k8s-cluster, 9797c3dc-a05b-4d8c-9e03-537c5348af40 (init), 4c278c3b-6eb8-449d-be86-2111c6f58d38 (init)
Error from server (BadRequest): container "k8s-cluster" in pod "scan-vulnerabilityreport-755cd9546-k7wz6" is waiting to start: ErrImageNeverPull

And this is message when we describe scan pod

...
  containerStatuses:
  - image: k8s.io/kubernetes:1.25.16-eks-508b6b3
    imageID: ""
    lastState: {}
    name: k8s-cluster
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Container image "k8s.io/kubernetes:1.25.16-eks-508b6b3" is not present
          with pull policy of Never
        reason: ErrImageNeverPull
...

Any suggestion to resolve this issue would be very much appreciated!

Thanks!

Environment:

Trivy-Operator version (use trivy-operator version): 0.18.3
Kubernetes version (use kubectl version): 1.25
OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc):

Apr 04 '24 10:04 chary1112004

@chary1112004 thanks for reporting this issue, I have never experienced it and I'll have to investigate it and update you

Apr 04 '24 11:04 chen-keinan

@chary1112004 tried to investigate this, however no luck , I'm unable to reproduce it.

Apr 18 '24 11:04 chen-keinan

@chen-keinan you have tried to reproduce when deploy trivy to eks?

Apr 22 '24 09:04 chary1112004

@chary1112004 nope, but I do not think its related to cloud provider setting, its look like cluster config in a way.

Apr 24 '24 05:04 chen-keinan

@chen-keinan sorry, I just mean kubernetes

Apr 25 '24 09:04 chary1112004

I also get this on EKS when using Bottlerocket nodes (no idea if normal AL23 nodes also have it).

May 12 '24 11:05 rknightion

Also happens in a disconnected openshift environment. What I specifically see is the hash matches the tag of the kubelet version...so it's trying to pull a matching image, I just don't understand where it's getting the idea to pull from k8s.io - that's nowhere in my config.

I also have .Values.operator.infraAssessmentScannerEnabled: false, so I don't suspect its the nodeCollector. Any other ideas?

May 20 '24 22:05 rickymulder

I am also seeing this, let me know if I can provide any configuration details:

helm upgrade --install trivy-operator aqua/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  -f values.yaml \
  --version 0.21.4

nodeCollector:
  useNodeSelector: false
#  excludeNodes: node-role.kubernetes.io/control-plane=true
trivy:
  ignoreUnfixed: true
  command: filesystem
operator:
  controllerCacheSyncTimeout: 25m
trivyOperator:
  scanJobPodTemplateContainerSecurityContext:
    runAsUser: 0

May 21 '24 11:05 titansmc

I have the same issue. The cluster is running v1.29.5+k3s1 on Ubuntu 22.04 and Trivy-operator is deployed using:

---

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: trivy-system

resources:
  - trivy-operator.yml
  - https://raw.githubusercontent.com/aquasecurity/trivy-operator/v0.21.1/deploy/static/trivy-operator.yaml

patches:
  - patch: |-
      - op: replace
        path: /data/OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_IMAGE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
        value: "3"
    target:
      kind: ConfigMap
      name: trivy-operator-config
  - patch: |-
      - op: replace
        path: /data/trivy.command
        value: "rootfs"
    target:
      kind: ConfigMap
      name: trivy-operator-trivy-config
  - patch: |-
      - op: replace
        path: /data/scanJob.podTemplateContainerSecurityContext
        value: "{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]},\"privileged\":false,\"readOnlyRootFilesystem\":true,\"runAsUser\":0}"
    target:
      kind: ConfigMap
      name: trivy-operator

May 26 '24 13:05 ondrejmo

I have the same issue. The cluster is running v1.29.5+k3s1 on Ubuntu 22.04 and Trivy-operator is deployed using:

---

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: trivy-system

resources:
  - trivy-operator.yml
  - https://raw.githubusercontent.com/aquasecurity/trivy-operator/v0.21.1/deploy/static/trivy-operator.yaml

patches:
  - patch: |-
      - op: replace
        path: /data/OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_IMAGE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED
        value: "false"
      - op: replace
        path: /data/OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT
        value: "3"
    target:
      kind: ConfigMap
      name: trivy-operator-config
  - patch: |-
      - op: replace
        path: /data/trivy.command
        value: "rootfs"
    target:
      kind: ConfigMap
      name: trivy-operator-trivy-config
  - patch: |-
      - op: replace
        path: /data/scanJob.podTemplateContainerSecurityContext
        value: "{\"allowPrivilegeEscalation\":false,\"capabilities\":{\"drop\":[\"ALL\"]},\"privileged\":false,\"readOnlyRootFilesystem\":true,\"runAsUser\":0}"
    target:
      kind: ConfigMap
      name: trivy-operator

I did a hard-restart of the cluster (rebooted all nodes, deleted & re-created all pods) and it seems to have fixed the issue for me.

May 27 '24 17:05 ondrejmo

trivy-operator trivy-operator copied to clipboard

ErrImageNeverPull with trivy.command = filesystem or rootfs

trivy-operator
trivy-operator copied to clipboard