operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

bundle-upgrade operation stuck and never timeouts

Open jotak opened this issue 2 years ago • 2 comments

Bug Report

Hi,

When doing a run bundle-upgrade operation to test an operator upgrade, the operation is stuck after logging Found existing catalog source with name etc.

I reproduce consistently, every time I try.

Also, it seems to never timeout (I've been waiting for ~10min so far), apparently not honouring the timeout option that is said to default to 2 minutes.

What did you do?

operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.1.3
# wait a little bit
operator-sdk run bundle-upgrade quay.io/netobserv/network-observability-operator-bundle:v0.1.4

What did you expect to see?

Success

What did you see instead? Under which circumstances?

Stdout:

INFO[0002] Found existing subscription with name netobserv-operator-v0-1-3-sub and namespace network-observability 
INFO[0002] Found existing catalog source with name netobserv-operator-catalog and namespace network-observability 

(and stuck at this point)

Pod logs:

$ oc logs quay-io-netobserv-network-observability-operator-bundle-v0-1-3
time="2022-08-09T10:54:14Z" level=info msg="serving registry" configs=/netobserv-operator-catalog-configs port=50051

catalog-operator

There are a couple of errors of type the object has been modified; please apply your changes to the latest version and try again, not sure if they are relevant for this ticket:

$ oc logs catalog-operator-847b966b64-f7rqf  -n openshift-operator-lifecycle-manager
[...]
time="2022-08-09T10:54:28Z" level=info msg=syncing id=pFDid ip=install-tz8jd namespace=network-observability phase=Installing
time="2022-08-09T10:54:29Z" level=info msg="added to bundle, Kind=ConfigMap" configmap=network-observability/9b34086d5b83308e052bd989ee88c04b7987b0245ac7c3f363215c6215961ba key=netobserv-manager-config_v1_configmap.yaml
time="2022-08-09T10:54:29Z" level=info msg="added to bundle, Kind=ClusterRole" configmap=network-observability/9b34086d5b83308e052bd989ee88c04b7987b0245ac7c3f363215c6215961ba key=netobserv-metrics-reader_rbac.authorization.k8s.io_v1_clusterrole.yaml
time="2022-08-09T10:54:29Z" level=info msg="added to bundle, Kind=Service" configmap=network-observability/9b34086d5b83308e052bd989ee88c04b7987b0245ac7c3f363215c6215961ba key=netobserv-metrics-service_v1_service.yaml
time="2022-08-09T10:54:29Z" level=info msg="added to bundle, Kind=ClusterServiceVersion" configmap=network-observability/9b34086d5b83308e052bd989ee88c04b7987b0245ac7c3f363215c6215961ba key=netobserv-operator.clusterserviceversion.yaml
time="2022-08-09T10:54:29Z" level=info msg="added to bundle, Kind=CustomResourceDefinition" configmap=network-observability/9b34086d5b83308e052bd989ee88c04b7987b0245ac7c3f363215c6215961ba key=flows.netobserv.io_flowcollectors.yaml
time="2022-08-09T10:54:30Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T10:54:30Z" level=info msg=syncing id=IeaE1 ip=install-tz8jd namespace=network-observability phase=Complete
time="2022-08-09T10:54:30Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T10:54:30Z" level=warning msg="an error was encountered during reconciliation" error="Operation cannot be fulfilled on subscriptions.operators.coreos.com \"netobserv-operator-v0-1-3-sub\": the object has been modified; please apply your changes to the latest version and try again" event=update reconciling="*v1alpha1.Subscription" selflink=
E0809 10:54:30.954906       1 queueinformer_operator.go:290] sync {"update" "network-observability/netobserv-operator-v0-1-3-sub"} failed: Operation cannot be fulfilled on subscriptions.operators.coreos.com "netobserv-operator-v0-1-3-sub": the object has been modified; please apply your changes to the latest version and try again
time="2022-08-09T10:54:30Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T10:54:30Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T10:54:31Z" level=info msg=syncing id=/vpfz ip=install-tz8jd namespace=network-observability phase=Complete
time="2022-08-09T10:59:07Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"community-operators\": the object has been modified; please apply your changes to the latest version and try again" id=JD1cF source=community-operators
E0809 10:59:07.229352       1 queueinformer_operator.go:290] sync {"update" "openshift-marketplace/community-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "community-operators": the object has been modified; please apply your changes to the latest version and try again
time="2022-08-09T11:02:06Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T11:02:15Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T11:02:15Z" level=info msg=syncing id=15vQh ip=install-tz8jd namespace=network-observability phase=Complete
time="2022-08-09T11:02:15Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T11:17:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T11:17:52Z" level=info msg=syncing id=OoS1E ip=install-tz8jd namespace=network-observability phase=Complete
time="2022-08-09T11:17:52Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-08-09T11:17:52Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
[...]

olm-operator

Here again, similar errors:

$ oc logs olm-operator-6854fbd4c6-d6l4r  -n openshift-operator-lifecycle-manager
[...]
time="2022-08-09T10:54:37Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=e5qLU namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:37Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=8Wlx7 namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:37Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=M3nR1 namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:37Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=jD+fF namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:38Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=9D1S5 namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:38Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=p8DoE namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:39Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=/IlK9 namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:39Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=tYayq namespace=network-observability phase=Installing strategy=deployment
time="2022-08-09T10:54:40Z" level=info msg="install strategy successful" csv=netobserv-operator.v0.1.3 id=Gd2sN namespace=network-observability phase=Installing strategy=deployment
I0809 10:54:40.362599       1 event.go:285] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"network-observability", Name:"netobserv-operator.v0.1.3", UID:"a22b0bd6-6fa8-480a-924f-0110469f6304", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"60812", FieldPath:""}): type: 'Normal' reason: 'InstallSucceeded' install strategy completed with no errors
E0809 11:03:18.334912       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
E0809 11:12:12.583008       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
E0809 11:16:39.652694       1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
[...]

Environment

Operator type:

/language go

Kubernetes cluster type:

OpenShift (ocp 4.11 rc6)

$ operator-sdk version

operator-sdk version: "v1.22.2", commit: "da3346113a8a75e11225f586482934000504a60f", kubernetes version: "1.24.1", go version: "go1.18.4", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

go version go1.17.11 linux/amd64

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v0.23.0", GitCommit:"6db43e22ecffd58dbd1cfb3b245967bde98f9a66", GitTreeState:"clean", BuildDate:"2022-03-14T13:42:44Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0+9546431", GitCommit:"0a57f1f59bda75ea2cf13d9f3b4ac5d202134f2d", GitTreeState:"clean", BuildDate:"2022-07-08T19:55:26Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

jotak avatar Aug 09 '22 11:08 jotak

PS: I saw this issue that could be similar: https://github.com/operator-framework/operator-sdk/issues/5773 , but since symptoms seem different, I opened a new one

jotak avatar Aug 09 '22 11:08 jotak

Found a workaround, inspired by https://github.com/operator-framework/operator-sdk/issues/5946 Using --index-image pointing to my operator-sdk cli version worked:

operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.1.3 --index-image quay.io/operator-framework/opm:v1.22

operator-sdk run bundle-upgrade quay.io/netobserv/network-observability-operator-bundle:v0.1.4

jotak avatar Aug 09 '22 11:08 jotak

@jotak here's the temporary workaround for this issue before we pin opm's version: https://github.com/operator-framework/operator-registry/issues/984#issuecomment-1175091401

Feel free to reopen this issue if you're still seeing issues with run bundle-upgrade.

rashmigottipati avatar Aug 16 '22 15:08 rashmigottipati