[SURE-8309] fleet-agent in rke2 cluster repeatedly deploying managed-system-agent bundle
SURE-8309
Issue description:
If you check the fleet-agent logs in an rke2 cluster that is provisioned by Rancher, you will see it spam that it is deploying the managed-system-agent bundle.
time="2024-04-24T22:42:00Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:01Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:04Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
...
Business impact:
It seems like a nuisance more than anything. The cluster is healthy.
Troubleshooting steps:
I think the log is spamming this in error. When I check the actual last update time in the status of the manifest for the bundle, it doesn't show it being deployed since it was originally deployed
Repro steps:
- Customer is using Rancher 2.7.10. I even reproduced this in Rancher 2.8.3 with v1.28.8 +rke2r1
- Create a new RKE2 cluster via Rancher
- Explore the cluster and view the logs for the fleet-agent pod
Related to https://github.com/rancher/fleet/issues/2869
This should be fixed by https://github.com/rancher/fleet/pull/2917
@manno #2917 got merged into main while the issue here is scheduled for 2.9.3 🤔 /cc @weyfonk
@manno #2917 got merged into
mainwhile the issue here is scheduled for2.9.3🤔 /cc @weyfonk
Good catch. Updated the milestone to v2.10.0 on this issue and created #2944 as a backport to v2.9.3.
Additional QA
Problem
Adding an RKE2 cluster to a Fleet setup would lead to multiple log messages about the Fleet agent bundle being deployed, although that deployment would actually happen only once.
Solution
Moved the log message to ensure that it is only produced when a deployment actually happened, not when it was skipped due to the bundle and release already being deployed.
Testing
Engineering Testing
Manual Testing
Observed a single Deployed bundle log for a Fleet agent bundle, vs multiple identical logs before the fix.
Tests were run against a local k3s cluster, as this issue is expected to be independent from the underlying K8s distribution.
Automated Testing
N/A (tests do not cover logging)
QA Testing Considerations
We should validate that this fix works on RKE2 as well (single Deployed bundle log for Fleet agent deployment).
:warning: The log message appears as Deploying bundle in the issue description, because it involves an older Fleet version. From Fleet 0.10 onwards, that message is Deployed bundle, as it is produced right after a deployment, not before starting it.
Regressions Considerations
N/A
System Information
Before Upgrade
| Rancher Version | Fleet Version |
|---|---|
| Prime:2.9.3 | Fleet: 0.10.4 |
After Upgrade
| Rancher Version | Fleet Version |
|---|---|
| v2.10.0-alpha5 | fleet:v0.11.0-beta.3 |
Before Upgrade and After Upgrade is performed just to make sure nothing is broken.
Before Upgrade
- Deploy a
Rancher Primelatest released version (I installedprime/2.9.3) - Import downstream cluster. I used
k3dandk3s - Observed the logs of the
fleet-agentinstalled in the downstream cluster.
In the fleet-agent pod logs you'll see the repeated messages related to Deployed bundle
Note: I filtered with the string Deployed bundle to see log repetition behavior.
Logs:
Fleet agent logs before upgrade
{"level":"info","ts":"2024-10-30T13:27:39Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-imported-0","namespace":"cluster-fleet-default-imported-0-f1158df57e01"},"namespace":"cluster-fleet-default-imported-0-f1158df57e01","name":"fleet-agent-imported-0","reconcileID":"4b512da4-0bc1-4b3e-9c10-440c88878380","deploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21","appliedDeploymentID":"","release":"cattle-fleet-system/fleet-agent-imported-0:1","DeploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21"}
After Upgrade
- Upgraded same cluster to
v2.10.0-alpha5which installsFleet version: 0.11.0-beta.3 - Observe the
fleet-agentpod logs on the imported clusters.
In the fleet-agent pod logs you'll see the only single entry for Deployed bundle
Note: I filtered with the string Deployed bundle to see log repetition behavior.
Logs:
Fleet agent logs After upgrade
{"level":"info","ts":"2024-10-30T15:52:57Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-local","namespace":"cluster-fleet-local-local-1a3d67d0a899"},"namespace":"cluster-fleet-local-local-1a3d67d0a899","name":"fleet-agent-local","reconcileID":"f00fc4f1-a2d1-4e9c-bd7f-011e20a2f36c","deploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","appliedDeploymentID":"s-caf1a1a3ddc0a15edb34c257394e886c52ffbabd2060c478472d9b7f9cf08:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","release":"cattle-fleet-local-system/fleet-agent-local:3","DeploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557"}