when create multiple subscriptions,installplan status is null
Bug Report
What did you do? I created multiple subscriptions in a short time, then these subscription status is InstallPlanPending, Like the following:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
...
status:
catalogHealth:
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: catalog
namespace: cpaas-system
resourceVersion: '5728'
uid: 514fc6ad-b5e5-4ade-a731-37ba50f7eeda
healthy: true
lastUpdated: '2022-03-10T10:02:41Z'
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: system
namespace: cpaas-system
resourceVersion: '5435'
uid: 2aae92cc-f946-4cac-ac3f-4459d793a7fb
healthy: true
lastUpdated: '2022-03-10T10:02:41Z'
conditions:
- lastTransitionTime: '2022-03-10T10:02:41Z'
message: all available catalogsources are healthy
reason: AllCatalogSourcesHealthy
status: 'False'
type: CatalogSourcesUnhealthy
- lastTransitionTime: '2022-03-10T10:08:47Z'
reason: InstallPlanNotYetReconciled
status: 'True'
type: InstallPlanPending
currentCSV: redis-operator.v3.8.0-13-gc1a15353
installPlanGeneration: 439
installPlanRef:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-7g6lp
namespace: operators
resourceVersion: '94336'
uid: ae6b8a8b-126a-40ce-b4fc-a213fb7143d3
installplan:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-7g6lp
uuid: ae6b8a8b-126a-40ce-b4fc-a213fb7143d3
lastUpdated: '2022-03-10T10:08:47Z'
state: UpgradePending
I can see the installplan status is nil
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
metadata:
...
spec:
approval: Automatic
approved: true
clusterServiceVersionNames:
- rds-operator.v3.8.0
- pmm-operator.v3.8.0
- percona-xtradb-cluster-operator.v3.8.0
- strimzi-kafka-operator.v3.1.0
- tekton.v0.54.0
- redis-operator.v0.8.0
generation: 439
At the same time,the catalog-operator logs:
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing id=fi43d ip=install-shmzv namespace=operators phase=
time="2022-03-10T10:08:41Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=fi43d ip=install-shmzv namespace=operators phase=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing id=vK+jd ip=install-fr9n8 namespace=operators phase=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=vK+jd ip=install-fr9n8 namespace=operators phase=
E0310 10:08:41.682192 1 queueinformer_operator.go:290] sync "operators" failed: etcdserver: request is too large
E0310 10:08:41.894006 1 queueinformer_operator.go:290] sync {"update" "operators"} failed: etcdserver: request is too large
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:41Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:42Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=warning msg="no installplan found with matching generation, creating new one" id=FRSXt namespace=operators
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:44Z" level=info msg=syncing id=D8ka2 ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:44Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=D8ka2 ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:45Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing id=zRSXl ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=zRSXl ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-10T10:08:46Z" level=info msg=syncing id=UyQb+ ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=UyQb+ ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg=syncing id=xcBMx ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=xcBMx ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg=syncing id=RcYlH ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:46Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=RcYlH ip=install-7g6lp namespace=operators phase=
time="2022-03-10T10:08:47Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
Environment
- operator-lifecycle-manager version: 0.19.0
- Kubernetes version information: v1.22.7
- Kubernetes cluster kind:
Possible Solution
I guess catalog-operator needs to retry here https://github.com/operator-framework/operator-lifecycle-manager/blob/2d649b0d5935ecfecfefbe56f266cc7b04d0b290/pkg/controller/operators/catalog/operator.go#L1255-L1260
Hi @aiyijing,
Looking at this issue more in-depth it's not immediately clear what the problem is. The interesting log here is
sync "operators" failed: etcdserver: request is too large
it's an unusual error. When you encounter this issue, could you also provide the api-server logs on the cluster? That should help us look into this issue more.
Also, for reproducibility, if you could provide the manifests (subscription, catalogsources) that you used to generate the installplan that would also be helpful.
@exdx Unfortunately, the log no longer exists。
But I remember that I checked the audit log about apiserver. The catalog-operator update installplan.status caused the apiserver/etcd 500 error.
As you said, maybe installplan.status is too large?
I found some instructions about etcd https://etcd.io/docs/v3.3/dev-guide/limit/
So,Does installplan need to limit the number of CSV installations or updates? 😄😄😄
There is very confused why single installplan need to install multiple csv and dependent resource of csv. Maybe it need to improve here because of etcd requests limit
@aiyijing This is because OLM install / update resolution logic works at the namespace level and therefore considers all operators currently installed in the namespace via Subscription. The solution is to install these operators in other namespaces.
@dmesser
This seems to be another solution:
Installplan be split according to Resolving of step, and then create and update the installplan.
Otherwise we have to modify etcd request limit,obviously it is not reasonable.
There are more restrictions:
I extend etcd --max-request-bytes to 10Mi but it is sadly that grpc server has restrictions of grpc send message size limit.
If I install a lot of operator in global NS and at upgrade the index image, all operators will fail to upgrade because installplan cannot be updated successfully.
time="2022-03-29T17:16:17Z" level=warning msg="no installplan found with matching generation, creating new one" id=RpdmR namespace=operators
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing id=wdOG+ ip=install-fnm47 namespace=operators phase=
time="2022-03-29T17:16:17Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=wdOG+ ip=install-fnm47 namespace=operators phase=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
E0329 17:16:18.184570 1 queueinformer_operator.go:290] sync "operators" failed: rpc error: code = ResourceExhausted desc = trying to send message larger than max (2627600 vs. 2097152)