fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Unstable gitrepo ready condition for multiple non-ready bundles

Open rbreddy opened this issue 9 months ago • 2 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

when there is more than one bundle with "ready: false" state, gitrepos objects will be updated frequently (creating a new resource version) with a "new" message from a different bundle This causes unnecessary activity at the control plane, eventually slowing down Rancher Continuous Delivery UI

Expected Behavior

the same gitrepo status is produced if there is no changes on the bundle states

Steps To Reproduce

  1. Installing Rancher v2.10.3, running Fleet 0.11.4
  2. Create a GitRepo with more than one bundle in error state `kind: GitRepo apiVersion: fleet.cattle.io/v1alpha1 metadata: name: bundledependency namespace: fleet-default spec: repo: https://github.com/rbreddy/bundledependency branch: main targets:
    • clusterSelector: matchExpressions: - key: provider.cattle.io operator: NotIn values: - harvester`
  3. To analyze it: leave it running for a while and then watch gitrepos, e.g.: for kind in gitrepos bundles bundledeployments; do { kubectl get -A --show-managed-fields --chunk-size=0 --watch-only --output-watch-events -o yaml $kind >$kind-watch-only-events.yaml & pid=$! sleep 180 kill $pid } & done wait
  4. Parse the gitrepos-watch-only-events.yaml with: https://gist.github.com/aruiz14/b58fcc96fde894cbf85562e888d8e1bd

Environment

- Architecture: amd64
- Fleet Version:v0.11.4

Logs

Example of frequent updates on the message
@@ -100,8 +100,8 @@
 status:
   commit: 124109ec64e6c2ef5de39cd7a704bda6e2d4b49e
   conditions:
-    - lastUpdateTime: "2025-03-27T09:47:24Z"
-      message: 'ErrApplied(1) [Cluster fleet-default/downstream-0-0: list bundledeployments: no bundles matching labels fleet.cattle.io/bundle-name=logging-logging-crd,fleet.cattle.io/bundle-namespace=fleet-default in namespace fleet-default]'
+    - lastUpdateTime: "2025-03-27T09:46:20Z"
+      message: 'ErrApplied(1) [Cluster fleet-default/downstream-0-0: cannot patch "nginx-deployment" with kind Deployment: Deployment.apps "nginx-deployment" is invalid: [spec.template.metadata.labels: Invalid value: map[string]string{"app":"nginx-rancher"}: `selector` does not match template `labels`, spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"nginx-rancher123"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable]]'
       status: "False"
       type: Ready
     - lastUpdateTime: "2025-03-27T09:30:02Z"
--- /dev/fd/63	2025-03-27 12:57:55.178285163 +0100
+++ /dev/fd/62	2025-03-27 12:57:55.178285163 +0100
@@ -81,10 +81,10 @@
       manager: fleetcontroller
       operation: Update
       subresource: status
-      time: "2025-03-27T11:34:40Z"
+      time: "2025-03-27T11:34:56Z"
   name: rafa
   namespace: fleet-default
-  resourceVersion: "47504"
+  resourceVersion: "47577"
   uid: 1786f5c8-2233-48e7-b5fc-f00625245612
 spec:
   branch: main
@@ -123,7 +123,7 @@
     readyBundleDeployments: 2/7
     state: ErrApplied
   gitJobStatus: Current
-  lastPollingTriggered: "2025-03-27T11:34:40Z"
+  lastPollingTriggered: "2025-03-27T11:34:55Z"
   observedGeneration: 3
   readyClusters: 0
   resourceCounts:
--- /dev/fd/63	2025-03-27 12:57:55.246285497 +0100
+++ /dev/fd/62	2025-03-27 12:57:55.246285497 +0100
@@ -84,7 +84,7 @@
       time: "2025-03-27T11:34:56Z"
   name: rafa
   namespace: fleet-default
-  resourceVersion: "47577"
+  resourceVersion: "47578"
   uid: 1786f5c8-2233-48e7-b5fc-f00625245612
 spec:
   branch: main
@@ -100,8 +100,8 @@
 status:
   commit: 124109ec64e6c2ef5de39cd7a704bda6e2d4b49e
   conditions:
-    - lastUpdateTime: "2025-03-27T09:46:20Z"
-      message: 'ErrApplied(1) [Cluster fleet-default/downstream-0-0: cannot patch "nginx-deployment" with kind Deployment: Deployment.apps "nginx-deployment" is invalid: [spec.template.metadata.labels: Invalid value: map[string]string{"app":"nginx-rancher"}: `selector` does not match template `labels`, spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"nginx-rancher123"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable]]'
+    - lastUpdateTime: "2025-03-27T09:47:40Z"
+      message: 'ErrApplied(1) [Cluster fleet-default/downstream-0-0: list bundledeployments: no bundles matching labels fleet.cattle.io/bundle-name=longhorn-crd,fleet.cattle.io/bundle-namespace=fleet-default in namespace fleet-default]'
       status: "False"
       type: Ready
     - lastUpdateTime: "2025-03-27T09:30:02Z"

Anything else?

No response

rbreddy avatar Mar 27 '25 15:03 rbreddy

/backport v2.11.1

manno avatar Mar 28 '25 09:03 manno

/backport v2.10.5

0xavi0 avatar Apr 02 '25 09:04 0xavi0

QA Template

Solution

Sort Bundles before selection.

Testing

(from the reproduction steps in the description)

  1. Create a GitRepo with more than one bundle in error state
kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  name: bundledependency
  namespace: fleet-default
spec:
  repo: https://github.com/rbreddy/bundledependency 
  branch: main
  targets:
    - clusterSelector:
        matchExpressions:
          - key: provider.cattle.io
            operator: NotIn
            values:
              - harvester
  1. Once deployed, disable polling to avoid possible noise.
  2. Watch GitRepo Ready .status.condition: after these changes, it should be stable and always mention the same Bundle as the cause for not being ready.

aruiz14 avatar May 28 '25 08:05 aruiz14

After following steps from comment: https://github.com/rancher/fleet/issues/3484#issuecomment-2915442721

Screenshot Showing bundle stable message even after keeping GitRepo for long time.

Image

GitRepo YAML shows, stable bundle message

Image

sbulage avatar May 29 '25 12:05 sbulage