trivy-operator icon indicating copy to clipboard operation
trivy-operator copied to clipboard

StorageError: invalid object Code: 4

Open afagund opened this issue 2 years ago • 8 comments

What steps did you take and what happened:

kubectl logs trivy-operator-5454777888-v5qln

{"level":"error","ts":"2023-09-13T00:48:33Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-6c44cf7db8","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-6c44cf7db8","reconcileID":"9bd516cf-bd25-4795-9961-e6ec7b1375bb","error":"Operation cannot be fulfilled on vulnerabilityreports.aquasecurity.github.io \"pod-76474d48f9\": StorageError: invalid object, Code: 4, Key: /registry/aquasecurity.github.io/vulnerabilityreports/inventory-workflows/pod-76474d48f9, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d3b7cc6e-71dd-4c34-8918-b6b6178917fd, UID in object meta: ","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"}

What did you expect to happen:

See no errors in the logs

Anything else you would like to add:

Fresh Trivy operator install running in ClientServer mode

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.16.0

  • Kubernetes version (use kubectl version):

Client Version: v1.28.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.24.16-eks-2d98532

  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc):

Ubuntu 22.04.3 LTS (Jammy Jellyfish)

afagund avatar Sep 13 '23 14:09 afagund

@afagund thank you for reporting this issue , can you elaborate a bit more.

  • are resources has been scanned and report generated ?
  • where and when you get this erorr ?

chen-keinan avatar Sep 14 '23 05:09 chen-keinan

@chen-keinan resources are being scanned fined and I am seeing that error in the trivy controller logs. Looks like the error is being generated when the controller is doing something on its own resource (scan-vulnerabilityreport-6c44cf7db8)

afagund avatar Sep 14 '23 10:09 afagund

its look like resource revision issue when trying to store report in etcd

chen-keinan avatar Oct 02 '23 12:10 chen-keinan

This still isn't solved? Another sample on a new install with the latest operator.

{"level":"error","ts":"2025-10-28T23:16:30Z","msg":"Reconciler error","controller":"daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","DaemonSet":{"name":"nvidia-gpu-driver-debian12-86bd8d5c6","namespace":"device-plugins"},"namespace":"device-plugins","name":"nvidia-gpu-driver-debian12-86bd8d5c6","reconcileID":"8e4eb735-034b-4691-b503-a01a61624cf9","error":"Operation cannot be fulfilled on configauditreports.aquasecurity.github.io \"daemonset-nvidia-gpu-driver-debian12-86bd8d5c6\": StorageError: invalid object, Code: 4, Key: /registry/aquasecurity.github.io/configauditreports/device-plugins/daemonset-nvidia-gpu-driver-debian12-86bd8d5c6, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c3c57134-b574-43f2-a7f4-123344031828, UID in object meta: ","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:474\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:296"}

Routhinator avatar Oct 28 '25 23:10 Routhinator

This still isn't solved? Another sample on a new install with the latest operator.

{"level":"error","ts":"2025-10-28T23:16:30Z","msg":"Reconciler error","controller":"daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","DaemonSet":{"name":"nvidia-gpu-driver-debian12-86bd8d5c6","namespace":"device-plugins"},"namespace":"device-plugins","name":"nvidia-gpu-driver-debian12-86bd8d5c6","reconcileID":"8e4eb735-034b-4691-b503-a01a61624cf9","error":"Operation cannot be fulfilled on configauditreports.aquasecurity.github.io \"daemonset-nvidia-gpu-driver-debian12-86bd8d5c6\": StorageError: invalid object, Code: 4, Key: /registry/aquasecurity.github.io/configauditreports/device-plugins/daemonset-nvidia-gpu-driver-debian12-86bd8d5c6, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: c3c57134-b574-43f2-a7f4-123344031828, UID in object meta: ","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:474\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:296"}

@Routhinator thanks for the report! could you share your daemonset yaml for testing? device-plugins/daemonset-nvidia-gpu-driver-debian12-86bd8d5c6

or just try to scan trivy config ./folder-with-daemonset

afdesk avatar Nov 03 '25 07:11 afdesk

@afdesk There are no matching daemonsets to the name that Trivy logs. That appears to be a dynamic container that's created by the NVIDIA GPU Operator - the closest match in cluster would the one below.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "10"
    nvidia.com/last-applied-hash: "756315103"
    openshift.io/scc: hostmount-anyuid
  creationTimestamp: "2025-06-22T22:00:18Z"
  generation: 10
  labels:
    app: nvidia-device-plugin-daemonset
    app.kubernetes.io/managed-by: gpu-operator
    helm.sh/chart: gpu-operator-v25.3.1
  name: nvidia-device-plugin-daemonset
  namespace: device-plugins
  ownerReferences:
  - apiVersion: nvidia.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterPolicy
    name: cluster-policy
    uid: 7f39eedb-6722-4975-8527-1a74a51d5207
  resourceVersion: "428178484"
  uid: dffcdb5e-2b4f-4da6-a7ab-9787eb3601cf
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nvidia-device-plugin-daemonset
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nvidia-device-plugin-daemonset
        app.kubernetes.io/managed-by: gpu-operator
        helm.sh/chart: gpu-operator-v25.3.1
    spec:
      containers:
      - args:
        - /bin/entrypoint.sh
        command:
        - /bin/bash
        - -c
        env:
        - name: PASS_DEVICE_SPECS
          value: "true"
        - name: FAIL_ON_INIT_ERROR
          value: "true"
        - name: DEVICE_LIST_STRATEGY
          value: envvar,cdi-annotations
        - name: DEVICE_ID_STRATEGY
          value: uuid
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: all
        - name: MPS_ROOT
          value: /run/nvidia/mps
        - name: CONFIG_FILE
          value: /config/config.yaml
        - name: MIG_STRATEGY
          value: single
        - name: NVIDIA_MIG_MONITOR_DEVICES
          value: all
        - name: CDI_ENABLED
          value: "true"
        - name: CDI_ANNOTATION_PREFIX
          value: nvidia.cdi.k8s.io/
        - name: NVIDIA_CDI_HOOK_PATH
          value: /usr/local/nvidia/toolkit/nvidia-cdi-hook
        image: harbor.home.routh.ca/nvidia-cache/nvidia/k8s-device-plugin:v0.17.2
        imagePullPolicy: IfNotPresent
        name: nvidia-device-plugin
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /bin/entrypoint.sh
          name: nvidia-device-plugin-entrypoint
          readOnly: true
          subPath: entrypoint.sh
        - mountPath: /var/lib/kubelet/device-plugins
          name: device-plugin
        - mountPath: /run/nvidia/validations
          name: run-nvidia-validations
        - mountPath: /driver-root
          mountPropagation: HostToContainer
          name: driver-install-dir
        - mountPath: /host
          mountPropagation: HostToContainer
          name: host-root
          readOnly: true
        - mountPath: /var/run/cdi
          name: cdi-root
        - mountPath: /dev/shm
          name: mps-shm
        - mountPath: /mps
          name: mps-root
        - mountPath: /config
          name: config
        - mountPath: /available-configs
          name: time-slicing-config-all
      - command:
        - config-manager
        env:
        - name: ONESHOT
          value: "false"
        - name: KUBECONFIG
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: NODE_LABEL
          value: nvidia.com/device-plugin.config
        - name: CONFIG_FILE_SRCDIR
          value: /available-configs
        - name: CONFIG_FILE_DST
          value: /config/config.yaml
        - name: DEFAULT_CONFIG
          value: any
        - name: SEND_SIGNAL
          value: "true"
        - name: SIGNAL
          value: "1"
        - name: PROCESS_TO_SIGNAL
          value: nvidia-device-plugin
        - name: FALLBACK_STRATEGIES
          value: empty
        image: harbor.home.routh.ca/nvidia-cache/nvidia/k8s-device-plugin:v0.17.2
        imagePullPolicy: IfNotPresent
        name: config-manager
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: config
        - mountPath: /available-configs
          name: time-slicing-config-all
      dnsPolicy: ClusterFirst
      initContainers:
      - args:
        - until [ -f /run/nvidia/validations/toolkit-ready ]; do echo waiting for
          nvidia container stack to be setup; sleep 5; done
        command:
        - sh
        - -c
        image: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native/gpu-operator-validator:v25.3.1
        imagePullPolicy: IfNotPresent
        name: toolkit-validation
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /run/nvidia/validations
          mountPropagation: HostToContainer
          name: run-nvidia-validations
      - command:
        - config-manager
        env:
        - name: ONESHOT
          value: "true"
        - name: KUBECONFIG
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: NODE_LABEL
          value: nvidia.com/device-plugin.config
        - name: CONFIG_FILE_SRCDIR
          value: /available-configs
        - name: CONFIG_FILE_DST
          value: /config/config.yaml
        - name: DEFAULT_CONFIG
          value: any
        - name: SEND_SIGNAL
          value: "false"
        - name: SIGNAL
        - name: PROCESS_TO_SIGNAL
        - name: FALLBACK_STRATEGIES
          value: empty
        image: harbor.home.routh.ca/nvidia-cache/nvidia/k8s-device-plugin:v0.17.2
        imagePullPolicy: IfNotPresent
        name: config-manager-init
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /config
          name: config
        - mountPath: /available-configs
          name: time-slicing-config-all
      nodeSelector:
        nvidia.com/gpu.deploy.device-plugin: "true"
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: nvidia-device-plugin
      serviceAccountName: nvidia-device-plugin
      shareProcessNamespace: true
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists
      volumes:
      - configMap:
          defaultMode: 448
          name: nvidia-device-plugin-entrypoint
        name: nvidia-device-plugin-entrypoint
      - hostPath:
          path: /var/lib/kubelet/device-plugins
          type: ""
        name: device-plugin
      - hostPath:
          path: /run/nvidia/validations
          type: DirectoryOrCreate
        name: run-nvidia-validations
      - hostPath:
          path: /run/nvidia/driver
          type: DirectoryOrCreate
        name: driver-install-dir
      - hostPath:
          path: /
          type: ""
        name: host-root
      - hostPath:
          path: /var/run/cdi
          type: DirectoryOrCreate
        name: cdi-root
      - hostPath:
          path: /run/nvidia/mps
          type: DirectoryOrCreate
        name: mps-root
      - hostPath:
          path: /run/nvidia/mps/shm
          type: ""
        name: mps-shm
      - configMap:
          defaultMode: 420
          name: time-slicing-config-all
        name: time-slicing-config-all
      - emptyDir: {}
        name: config
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 1
  desiredNumberScheduled: 1
  numberAvailable: 1
  numberMisscheduled: 0
  numberReady: 1
  observedGeneration: 10
  updatedNumberScheduled: 1

The exact helm install of the NVIDIA Operator that seems to break the Trivy Operator is:

---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: nvidia-gpu-operator
  namespace: device-plugins
spec:
  interval: 5m
  chart:
    spec:
      chart: gpu-operator
      version: "v25.3.1"
      sourceRef:
        kind: HelmRepository
        name: nvidia
        namespace: cluster-conf
  values:
    daemonsets:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
    validator:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    operator:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia
    driver:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia
      enabled: false
      nvidiaDriverCRD:
        enabled: true
    manager:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    toolkit:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/k8s
    devicePlugin:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia
      config:
        name: time-slicing-config-all
        default: any
    dcgm:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    dcgmExporter:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/k8s
    gfd:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia
    migManager:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    nodeStatusExporter:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia
    gds:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    gdrcopy:
      repository: harbor.home.routh.ca/nvidia-cache/nvidia/cloud-native
    nfd:
      enabled: false
    cdi:
      enabled: true
      default: true

NOTE the image overrides are just a pull through proxy - I'm still using vanilla images from NVIDIA as installed by the chart.

Routhinator avatar Nov 07 '25 17:11 Routhinator

For this request:

or just try to scan trivy config ./folder-with-daemonset

I'll need more context - are you expecting that I just exec into the operator pod and try running that against a folder where that exists?

Or are you expecting that I have this daemonset defined somewhere I can scan with regular trivy?

Routhinator avatar Nov 07 '25 17:11 Routhinator

It looks like this may have been seen before with the nvidia operator as I see people excluding them in https://github.com/aquasecurity/trivy-operator/issues/2592

But my question immediately becomes - how do we get scanning these images to work? I must scan all images in the cluster, without fail - exclusions aren't a workable option - so how do you get them to scan?

Routhinator avatar Nov 08 '25 00:11 Routhinator