operator-lifecycle-manager
operator-lifecycle-manager copied to clipboard
Subscription and CSV don't bind each other
Bug Report
This is an intermittent defect we observe in the operator install and upgrade. Operator Subscription and CSV can't bind each other.
What did you do? A clear and concise description of the steps you took (or insert a code snippet).
The issue has been seen in both fresh install and upgrade
- A subscription is created for an Operator.
- A operator is upgraded to a new version
What did you expect to see? A clear and concise description of what you expected to happen (or insert a code snippet).
I expect the operator could be deployed or upgraded successfully
What did you see instead? Under which circumstances? A clear and concise description of what you expected to happen (or insert a code snippet).
What I observe is the CSV of the operator is created, but there is no update in the subscription status, which cause even if the install plan is completed, the subscription is in the unknown
status and CSV is in the Cannot Update
status

Also, it will block the catalog operator sync other operators.
E0609 14:02:55.116596 1 queueinformer_operator.go:290] sync "ibm-common-services" failed: constraints not satisfiable: pkgunique/ibm-odlm permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, gvkunique/operator.ibm.com/v1alpha1/OperandRegistry permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0 is mandatory, ibm-odlm is mandatory, ibm-odlm requires at least one of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0
I0609 14:02:55.116762 1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"ibm-common-services", UID:"92621a07-5877-4ef2-bffa-dfb5e4252992", APIVersion:"v1", ResourceVersion:"146584", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: pkgunique/ibm-odlm permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, gvkunique/operator.ibm.com/v1alpha1/OperandRegistry permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0 is mandatory, ibm-odlm is mandatory, ibm-odlm requires at least one of opencloud-operators/openshift-market...
E0609 14:02:58.912745 1 queueinformer_operator.go:290] sync "ibm-common-services" failed: constraints not satisfiable: ibm-odlm requires at least one of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0 is mandatory, pkgunique/ibm-odlm permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, gvkunique/operator.ibm.com/v1alpha1/OperandRegistry permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, ibm-odlm is mandatory
I0609 14:02:58.912840 1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"ibm-common-services", UID:"92621a07-5877-4ef2-bffa-dfb5e4252992", APIVersion:"v1", ResourceVersion:"146584", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: ibm-odlm requires at least one of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0 is mandatory, pkgunique/ibm-odlm permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, gvkunique/operator.ibm.com/v1alpha1/OperandRegistry permits at most 1 of opencloud-operators/openshift-marketplace/v3/...
I0609 14:03:02.744559 1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"ibm-common-services", UID:"92621a07-5877-4ef2-bffa-dfb5e4252992", APIVersion:"v1", ResourceVersion:"146584", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: ibm-odlm requires at least one of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, pkgunique/ibm-odlm permits at most 1 of opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.5.0, opencloud-operators/openshift-marketplace/v3/operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0, @existing/ibm-common-services//operand-deployment-lifecycle-manager.v1.6.0 is mandatory, gvkunique/operator.ibm.com/v1alpha1/OperandConfig permits at most 1 of opencloud-operators/openshift-marketplace/v3/op...
I upload files of CSV, Subscription and installplan for further investigations. yaml-files.zip
Environment
- operator-lifecycle-manager version:
0.16.1
Since we have seen this issue on OCP 4.7 and 4.8, I believe this defect is on 0.17.0 and 0.17.1 as well.
- Kubernetes version information:
OCP 4.6,4.7,4.8
- Kubernetes cluster kind:
OCP
Possible Solution
Delete the operator CSV and let the catalog operator reconcile it again.
Additional context Add any other context about the problem here.
cc @pgodowski
The handoff between upgrades of different versions of an operator has some known visibility issues that plan to be addressed largely in the new APIs. Some work was done here recently but the fact that operators in a namespace are treated as a set during installation, where one failure affects all subsequent installs, is a problematic consequence of the multitenant nature of the OLM v1 APIs. Relates to #1565.
This problem can be considered as something that could be addressed by the new v2 Bundle APIs and resolution.
@njhale
Please let me know if this can make you think of something about the root cause :)
We have seen this issue again in our product.
Operator CSV is succeeded and install plan is completed, but subscription doesn't have status like installplan
or currentCSV
.
This is the status of the operator
status:
catalogHealth:
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: ace
namespace: openshift-marketplace
resourceVersion: "49186"
uid: 063ddf21-b2c7-48a0-9f9d-4d09998d96d9
healthy: true
lastUpdated: "2021-06-21T11:59:19Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: automation-base-pak-operators
namespace: openshift-marketplace
resourceVersion: "46005"
uid: 23ab4d68-1ea1-46c3-80ca-3358b737ace4
healthy: true
lastUpdated: "2021-06-21T11:59:19Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: certified-operators
namespace: openshift-marketplace
resourceVersion: "49182"
uid: 73615a74-1d38-4972-8f37-9a0000ba465b
healthy: true
lastUpdated: "2021-06-21T11:59:19Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: community-operators
namespace: openshift-marketplace
resourceVersion: "49185"
uid: d45a5fc0-132c-4c39-b695-c7a1233e8703
healthy: true
lastUpdated: "2021-06-21T11:59:19Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
Catalog operator shows
I0621 16:28:10.713590 1 event.go:278] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"", Name:"ibm-common-services", UID:"95a40970-4685-4d80-bcf5-8dddbfb091e7", APIVersion:"v1", ResourceVersion:"77866", FieldPath:""}): type: 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0 is mandatory, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.0, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.1, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0 and @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0 originate from package ibm-platform-api-operator-app, subscription ibm-platform-api-operator requires at least one of opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.1 or opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.0, subscription ibm-platform-api-operator exists
E0621 16:33:13.732081 1 queueinformer_operator.go:290] sync "ibm-common-services" failed: constraints not satisfiable: opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0, @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.0 and opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.1 originate from package ibm-platform-api-operator-app, @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0 is mandatory, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0 and @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0 provide PlatformAPI (operator.ibm.com/v1alpha1), subscription ibm-platform-api-operator exists, subscription ibm-platform-api-operator requires at least one of opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0, opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.1 or opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.9.0
I suspect this can happen when an error occurs here: https://github.com/operator-framework/operator-lifecycle-manager/blob/2c623e1e4877608fd16a6089a6aeeac5b1217f18/pkg/controller/operators/catalog/operator.go#L948
Since the InstallPlan is created successfully, the new operator version will be created, but the information necessary to populate the Subscription status is lost.
@benluddy Thanks for the information.
Please check if my analysis here is correct.
-
Update subscription status failed in https://github.com/operator-framework/operator-lifecycle-manager/blob/cd40303284a287d6bb920c18807e4f70fd7dd048/pkg/controller/operators/catalog/operator.go#L948
-
When reconciling again, it will be failed at operator resolving. Taking this https://github.com/operator-framework/operator-lifecycle-manager/issues/2201#issuecomment-865184465 as an example
-
CSV without subscription, namely
@existing/ibm-common-services//ibm-platform-api-operator.v3.10.0
, will be added into installable list in here: https://github.com/operator-framework/operator-lifecycle-manager/blob/cd40303284a287d6bb920c18807e4f70fd7dd048/pkg/controller/registry/resolver/resolver.go#L465 -
Also installable from subscription, namely
opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0
will be added in installable list in here: https://github.com/operator-framework/operator-lifecycle-manager/blob/cd40303284a287d6bb920c18807e4f70fd7dd048/pkg/controller/registry/resolver/resolver.go#L88 -
Then since both of them are providing the same GVK https://github.com/operator-framework/operator-lifecycle-manager/blob/cd40303284a287d6bb920c18807e4f70fd7dd048/pkg/controller/registry/resolver/resolver.go#L99, an error will be reported
opencloud-operators/openshift-marketplace/v3/ibm-platform-api-operator.v3.10.0 and @existing/ibm-common-services//ibm-platform-api-operator.v3.10.0 provide PlatformAPI (operator.ibm.com/v1alpha1)
@benluddy @njhale Please correct me if I am wrong and please advise if there is an enhancement we can do to prevent this issue.
Yes, exactly. Setting .status.installedCSV
changes the system of constraints when the enclosing namespace is resolved:
- the named CSV does not need to appear in the solution (i.e., it can be replaced via an upgrade)
- the candidates for the Subscription are limited to only those that can skip/replace the named CSV
- the named CSV can be used to satisfy the Subscription, in addition to the candidates offered by the catalog named in the Subscription's spec
That much is a current limitation due to the lack of a globally-unique bundle identity. That is, we can't be sure that a given CSV named "foo" represents exactly the same operator as another named "foo." Also, we don't have a record of the catalog that an operator was installed from -- or whether the catalog contents themselves have changed since installation.
The InstallPlan is supposed to be the record of the changes applied to the namespace due to resolution. Preventing the issues caused by an error on https://github.com/operator-framework/operator-lifecycle-manager/blob/cd40303284a287d6bb920c18807e4f70fd7dd048/pkg/controller/operators/catalog/operator.go#L948 probably involves deriving the relevant parts of Subscription status from the latest InstallPlan.
@benluddy @njhale
Do we have a plan to fix this defect? or reduce the risk because when this issue happens, the operator install and upgrade will be blocked and users can't find the cause easily.
when debugging the catalog-operator during one of these failed operator deployments the container crashed with the error fatal error: concurrent map writes
details can be found here:
crash.log
Adding this comment from a thread on olm-dev channel on kubernetes slack
I have reached a tentative conclusion after several days of continuous testing that the version of OLM I'm using from OCP 4.11.9 is working, and that previous releases of OCP included an OLM with an intermittent container crash.
Since there have been multiple occasions that new code has added to the catalog operator that resulted in such a crash, and that the OCP process to choose a version of OLM to ship has been unlucky more than once, I am wondering if there is any way to handle this type of failure better?
We are seeing this more often now it appears and it is having more of an impact on our product teams, and has hit some customers now. May be related to https://github.com/openshift/operator-framework-olm/pull/415
Fyi we just pushed through https://github.com/openshift/operator-framework-olm/pull/415 and this issue should be fixed in the next 4.10.z that it is available in.
@anik120 when will the next 4.10.z release be available? fyi @yuchen-fan
@anik120 any update on the fix for this issue in 4.10.z and is it included in 4.11 and 4.12?
@teethediva34 this KCS article has all the information about the concurrent map write
fix for OCP (including which z streams the fix is available in).
Upgrading CPD from 4.5.3 to 4.7.3 and while running apply-olm command, got the following error.
Conditions: Last Transition Time: 2023-11-13T21:33:00Z Message: targeted catalogsource ibm-cpd-operators/ibm-cpd-ccs-operator-catalog missing Reason: UnhealthyCatalogSourceFound Status: True Type: CatalogSourcesUnhealthy Message: constraints not satisfiable: no operators found from catalog ibm-cpd-ccs-operator-catalog in namespace ibm-cpd-operators referenced by subscription ibm-cpd-ccs-operator, subscription ibm-cpd-ccs-operator exists Reason: ConstraintsNotSatisfiable Status: True Type: ResolutionFailed Install Plan Generation: 7 Last Updated: 2023-11-13T21:41:25Z Events: <none>
This issue is still outstanding @anik120 and we are hitting it on later versions of OCP greater than 4.10.
oc version
Client Version: 4.14.17 Kustomize Version: v5.0.1 Server Version: 4.14.17 Kubernetes Version: v1.27.11+d8e449a
Problem Description: We are trying to upgrade from 4.8.4 to 4.8.5 with following services installed.
cpd_platform,edb_cp4d,mongodb_cp4d,watson_assistant,watson_speech,watsonx_orchestrate command used image
[✘] Error in /tmp/work/cpfs_scripts/4.8.5/cp3pt0-deployment/common/utils.sh at line 126 in function wait_for_condition: Timeout after 10 minutes waiting for operator ibm-common-service-operator to be upgraded [ERROR] 2024-04-19T08:25:15.597425Z cmd.Run() failed with exit status 1 [ERROR] 2024-04-19T08:25:15.597500Z Command exception: The setup-instance-topology command failed (exit status 1). You may find output and logs in the /tmp/work/cpd-cli-workspace/olm-utils-workspace/work directory. [ERROR] 2024-04-19T08:25:15.598237Z RunPluginCommand:Execution error: exit status 1
the oc get subs ibm-common-service-operator -o yaml -n operator-ns was showing the below error. followed https://www.ibm.com/docs/en/cloud-paks/foundational-services/4.5?topic=issues-olm-known-issue-resolutionfailed-message
- message: 'constraints not satisfiable: subscription ibm-namespace-scope-operator requires opencloud-operators/operator-ns/v4.2/ibm-namespace-scope-operator.v4.2.3, subscription ibm-namespace-scope-operator exists, clusterserviceversion ibm-namespace-scope-operator.v4.2.3 exists and is not referenced by a subscription, opencloud-operators/operator-ns/v4.2/ibm-namespace-scope-operator.v4.2.3 and @existing/operator-ns//ibm-namespace-scope-operator.v4.2.3 originate from package ibm-namespace-scope-operator' reason: ConstraintsNotSatisfiable