operator-registry
operator-registry copied to clipboard
Bundle Unpacker is not idempotent
Problem:
On OpenShift 4.9.17, A customer turned on a ResourceQuota on the openshift-marketplace
namespace which is the main namespace for OLM.
kind: ResourceQuota
apiVersion: v1
metadata:
name: temp-pod-quota
namespace: openshift-marketplace
spec:
hard:
pods: '25'
status:
hard:
pods: '25'
used:
pods: '25'
When attempting to install any operator, the quota quickly was exceeded and the unpack jobs were unable to create it's pods.... sometimes:
: Error creating: pods "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-b9nvz" is forbidden: exceeded quota: temp-pod-quota, requested: pods=1, used: pods=25, limited: pods=25
Error creating: pods "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-bfql6" is forbidden: exceeded quota: mch-temp-pod-quota, requested: pods=1, used: pods=25, limited: pods=25
...
More of these events every minute.
Some of the pods would get created and run successfully, some would fail to be created and some would be created and fail.
A few successful ones look like this: 878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-5dnj8
time="2022-03-02T13:35:45Z" level=info msg="Using in-cluster kube client config"
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/ibm-bedrock-version.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/ibm-common-service-operator.clusterserviceversion.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/operator.ibm.com_commonservices.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/metadata/annotations.yaml
The failures look like this: 878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-5bjdt (crashloopbackoff)
time="2022-03-02T19:47:05Z" level=info msg="Using in-cluster kube client config"
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/ibm-bedrock-version.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/ibm-common-service-operator.clusterserviceversion.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/operator.ibm.com_commonservices.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/metadata/annotations.yaml
Error: error loading manifests from directory: ConfigMap "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e82603883eb" is invalid: [data[ibm-bedrock-version.yaml]: Invalid value: "ibm-bedrock-version.yaml": duplicate of key present in binaryData, data[ibm-common-service-operator.clusterserviceversion.yaml]: Invalid value: "ibm-common-service-operator.clusterserviceversion.yaml": duplicate of key present in binaryData, data[operator.ibm.com_commonservices.yaml]: Invalid value: "operator.ibm.com_commonservices.yaml": duplicate of key present in binaryData]
Usage:
opm alpha bundle extract [flags]
Flags:
-c, --configmapname string name of configmap to write bundle data
-l, --datalimit uint maximum limit in bytes for total bundle data (default 1048576)
--debug enable debug logging
-h, --help help for extract
-k, --kubeconfig string absolute path to kubeconfig file
-m, --manifestsdir string path to directory containing manifests (default "/")
-n, --namespace string namespace to write configmap data (default "openshift-operator-lifecycle-manager")
Global Flags:
--skip-tls skip TLS certificate verification for container image registries while pulling bundles or index
Summery: It seems like either the bundle unpack job is not idempotent. If the ConfigMap already contains the keys, it should just move-on, or replace the values rather than fail with a duplicate key error.
This looks to be an edge-case in the unpacker job that is used to unpack content from a catalog onto the cluster via a ConfigMap. Ideally the unpack jobs would be idempotent and not conflict with one another.