fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[SURE-8309] fleet-agent in rke2 cluster repeatedly deploying managed-system-agent bundle

Open kkaempf opened this issue 1 year ago • 3 comments

SURE-8309

Issue description:

If you check the fleet-agent logs in an rke2 cluster that is provisioned by Rancher, you will see it spam that it is deploying the managed-system-agent bundle.

time="2024-04-24T22:42:00Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:01Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
time="2024-04-24T22:42:04Z" level=info msg="Deploying bundle cluster-fleet-default-quickstart-aws-custom-c1aeed1484bc/quickstart-aws-custom-managed-system-agent"
...

Business impact:

It seems like a nuisance more than anything. The cluster is healthy.

Troubleshooting steps:

I think the log is spamming this in error. When I check the actual last update time in the status of the manifest for the bundle, it doesn't show it being deployed since it was originally deployed

Repro steps:

  • Customer is using Rancher 2.7.10. I even reproduced this in Rancher 2.8.3 with v1.28.8 +rke2r1
  • Create a new RKE2 cluster via Rancher
  • Explore the cluster and view the logs for the fleet-agent pod

kkaempf avatar Sep 16 '24 15:09 kkaempf

Related to https://github.com/rancher/fleet/issues/2869

manno avatar Sep 30 '24 09:09 manno

This should be fixed by https://github.com/rancher/fleet/pull/2917

manno avatar Sep 30 '24 15:09 manno

@manno #2917 got merged into main while the issue here is scheduled for 2.9.3 🤔 /cc @weyfonk

kkaempf avatar Oct 01 '24 13:10 kkaempf

@manno #2917 got merged into main while the issue here is scheduled for 2.9.3 🤔 /cc @weyfonk

Good catch. Updated the milestone to v2.10.0 on this issue and created #2944 as a backport to v2.9.3.

weyfonk avatar Oct 09 '24 07:10 weyfonk

Additional QA

Problem

Adding an RKE2 cluster to a Fleet setup would lead to multiple log messages about the Fleet agent bundle being deployed, although that deployment would actually happen only once.

Solution

Moved the log message to ensure that it is only produced when a deployment actually happened, not when it was skipped due to the bundle and release already being deployed.

Testing

Engineering Testing

Manual Testing

Observed a single Deployed bundle log for a Fleet agent bundle, vs multiple identical logs before the fix. Tests were run against a local k3s cluster, as this issue is expected to be independent from the underlying K8s distribution.

Automated Testing

N/A (tests do not cover logging)

QA Testing Considerations

We should validate that this fix works on RKE2 as well (single Deployed bundle log for Fleet agent deployment).

:warning: The log message appears as Deploying bundle in the issue description, because it involves an older Fleet version. From Fleet 0.10 onwards, that message is Deployed bundle, as it is produced right after a deployment, not before starting it.

Regressions Considerations

N/A

weyfonk avatar Oct 10 '24 11:10 weyfonk

System Information

Before Upgrade

Rancher Version Fleet Version
Prime:2.9.3 Fleet: 0.10.4

After Upgrade

Rancher Version Fleet Version
v2.10.0-alpha5 fleet:v0.11.0-beta.3

Before Upgrade and After Upgrade is performed just to make sure nothing is broken.


Before Upgrade

  • Deploy a Rancher Prime latest released version (I installed prime/2.9.3)
  • Import downstream cluster. I used k3d and k3s
  • Observed the logs of the fleet-agent installed in the downstream cluster.

In the fleet-agent pod logs you'll see the repeated messages related to Deployed bundle

Note: I filtered with the string Deployed bundle to see log repetition behavior.

Logs:

Fleet agent logs before upgrade
{"level":"info","ts":"2024-10-30T13:27:39Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-imported-0","namespace":"cluster-fleet-default-imported-0-f1158df57e01"},"namespace":"cluster-fleet-default-imported-0-f1158df57e01","name":"fleet-agent-imported-0","reconcileID":"4b512da4-0bc1-4b3e-9c10-440c88878380","deploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21","appliedDeploymentID":"","release":"cattle-fleet-system/fleet-agent-imported-0:1","DeploymentID":"s-4f421ffd2d21effac6a8620b53a15eed5f18eeb0289d4ac2c0c2790e7e46a:8eaf3c183fe289136e7a2da3ceafa5c435e8f098e386324237baeb8811cf8b21"}

After Upgrade

  • Upgraded same cluster to v2.10.0-alpha5 which installs Fleet version: 0.11.0-beta.3
  • Observe the fleet-agent pod logs on the imported clusters.

In the fleet-agent pod logs you'll see the only single entry for Deployed bundle

Note: I filtered with the string Deployed bundle to see log repetition behavior.


Logs:

Fleet agent logs After upgrade
{"level":"info","ts":"2024-10-30T15:52:57Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-local","namespace":"cluster-fleet-local-local-1a3d67d0a899"},"namespace":"cluster-fleet-local-local-1a3d67d0a899","name":"fleet-agent-local","reconcileID":"f00fc4f1-a2d1-4e9c-bd7f-011e20a2f36c","deploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","appliedDeploymentID":"s-caf1a1a3ddc0a15edb34c257394e886c52ffbabd2060c478472d9b7f9cf08:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557","release":"cattle-fleet-local-system/fleet-agent-local:3","DeploymentID":"s-ca0f84354c57b23a7bb0d88e3947dfbb6c16397c40c78e94a6589e3d974c5:32eb9f72e71f28d6a3ee815ddf77e461aa78ec0b6870afe4765c423d31bf6557"}

sbulage avatar Oct 30 '24 16:10 sbulage