cockroach-operator icon indicating copy to clipboard operation
cockroach-operator copied to clipboard

Cluster fails to deploy when using Kyverno to mutate image name

Open glaberge opened this issue 6 months ago • 0 comments

Hello, I've deployed the operator in our cluster using the manifests, everything works as expected during install.

When deploying a cluster using the example.yaml, the vcheck job seems to run indefinitely. The job pod logs shows the version as expected. The operator logs don't point to any specific error at first, but after some time there seems to be some generic error messages.

crdblog.log

After some trial and error I was able to determine that the issue seems to be from a Kyverno policy that we use to patch images to use our pullthrough cache.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: replace-dockerhub-registry-container
spec:
  background: true
  rules:
    # Add the registry and tag if set implicitly
    - name: set-default-registry-container
      match:
        resources:
          kinds:
          - Pod
      mutate:
        foreach:
        # Containers
        - list: "request.object.spec.containers"
          patchStrategicMerge:
            spec:
              containers:
                - (name): "{{ element.name }}"
                  (image): |-
                    !{{ images.containers."{{ element.name }}".referenceWithTag }}
                  image: |-
                    {{ images.containers."{{ element.name }}".referenceWithTag }}

    # For all non official dockerhub images, replace the registry with the ECR pull through cache
    - name: replace-dockerhub-registry-container
      match:
        resources:
          kinds:
          - Pod
      preconditions:
        any:
          - key: "{{ request.object.spec.containers[*].image }}"
            operator: AnyIn
            value: "docker.io/*/*"
      mutate:
        foreach:
        # Containers
        - list: "request.object.spec.containers"
          patchStrategicMerge:
            spec:
              containers:
                - (name): "{{ element.name }}"
                  (image): |-
                    docker.io/*/*:{{images.containers."{{element.name}}".tag}}
                  image: our.pullthrough.cache.url/{{ images.containers."{{element.name}}".path }}:{{images.containers."{{element.name}}".tag}}

    # The only remaining dockerhub images are offical and can be replaced with the ECR pull through cache
    # Offical images require the prefix '/library' 
    - name: replace-official-dockerhub-registry-container
      match:
        resources:
          kinds:
          - Pod
      preconditions:
        any:
          - key: "{{ request.object.spec.containers[*].image }}"
            operator: AnyIn
            value: "docker.io/*"
      mutate:
        foreach:
        # Containers
        - list: "request.object.spec.containers"
          patchStrategicMerge:
            spec:
              containers:
                - (name): "{{ element.name }}"
                  (image): |-
                    docker.io/*:{{images.containers."{{element.name}}".tag}}
                  image: our.pullthrough.cache.url/library/{{ images.containers."{{element.name}}".path }}:{{images.containers."{{element.name}}".tag}}

Removing this policy allowed the cluster to initiate properly.

glaberge avatar Apr 16 '25 18:04 glaberge