fleet icon indicating copy to clipboard operation
fleet copied to clipboard

The hasConfigChanged has flaws and causes the unnecessary re-deployment of fleet-agent on local cluster

Open w13915984028 opened this issue 1 year ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

https://github.com/rancher/fleet/blob/1cddbbff1c2cd71c9b9011c3738754e5b4c8fa89/internal/cmd/controller/agentmanagement/controllers/cluster/import.go#L98

fleet-controller assumes the config is changed if following conditions are met

		hasConfigChanged := config.APIServerURL != cluster.Status.APIServerURL ||
			hashStatusField(config.APIServerCA) != cluster.Status.APIServerCAHash ||
			config.AgentTLSMode != cluster.Status.AgentTLSMode ||
			hasGarbageCollectionIntervalChanged(config, cluster)

however, the status values are fetched from an related secret, and if they are empty, then fallback to a configmap

https://github.com/rancher/fleet/blob/1cddbbff1c2cd71c9b9011c3738754e5b4c8fa89/internal/cmd/controller/agentmanagement/controllers/cluster/import.go#L232

...
	logrus.Debugf("Cluster import for '%s/%s'. Setting up agent with kubeconfig from secret '%s/%s'", cluster.Namespace, cluster.Name, kubeConfigSecretNamespace, cluster.Spec.KubeConfigSecret)
	var (
		cfg          = config.Get()
		apiServerURL = string(secret.Data[config.APIServerURLKey])
		apiServerCA  = secret.Data[config.APIServerCAKey]
	)

	if apiServerURL == "" {
		if len(cfg.APIServerURL) == 0 {
			return status, fmt.Errorf("missing apiServerURL in fleet config for cluster auto registration")
		}
		logrus.Debugf("Cluster import for '%s/%s'. Using apiServerURL from fleet-controller config", cluster.Namespace, cluster.Name)
		apiServerURL = cfg.APIServerURL
	}

	if len(apiServerCA) == 0 {
		apiServerCA = cfg.APIServerCA
	}


the cluster.fleet status is updated from:

	status.AgentDeployedGeneration = &cluster.Spec.RedeployAgentGeneration
	status.AgentMigrated = true
	status.CattleNamespaceMigrated = true
	status.Agent = fleet.AgentStatus{
		Namespace: cluster.Spec.AgentNamespace,
	}
	status.AgentNamespaceMigrated = true
	status.AgentConfigChanged = false
	status.APIServerURL = apiServerURL
	status.APIServerCAHash = hashStatusField(apiServerCA)
	status.AgentTLSMode = cfg.AgentTLSMode
	status.GarbageCollectionInterval = &cfg.GarbageCollectionInterval

On Harvester cluster, an Rancher is embeded for local cluster provision, in sequences, the fleet-controller and fleet-agent are also deployed.

There are configmaps:

configmap -n cattle-fleet-system fleet-controller -oyaml
apiVersion: v1
data:
  config: |
    agentCheckinInterval: 15m
    agentImage: rancher/fleet-agent:v0.10.2
    agentImagePullPolicy: IfNotPresent
    agentTLSMode: strict
    apiServerCA: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJ2VENDQVdPZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQkdNUnd3R2dZRFZRUUtFeE5rZVc1aGJXbGoKYkdsemRHVnVaWEl0YjNKbk1TWXdKQVlEVlFRRERCMWtlVzVoYldsamJHbHpkR1Z1WlhJdFkyRkFNVGN5T1RVNApPREl5TURBZUZ3MHlOREV3TWpJd09URXdNakJhRncwek5ERXdNakF3T1RFd01qQmFNRVl4SERBYUJnTlZCQW9UCkUyUjVibUZ0YVdOc2FYTjBaVzVsY2kxdmNtY3hKakFrQmdOVkJBTU1IV1I1Ym1GdGFXTnNhWE4wWlc1bGNpMWoKWVVBeE56STVOVGc0TWpJd01Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMERBUWNEUWdBRXpQdmFKY01CY3RtcgovTTdVdFZIOVlScmVMM0Z2dFhFWnZXOG9TUS9EVHdvNDZ1WmxnSW5wRThCbWM5b3BOaW95ZjhFa21ScGFlWFI3CnVud1VmLzJMRGFOQ01FQXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0hRWUQKVlIwT0JCWUVGRDRFMWwrKzdWVWVOMEdqSCs1WVpaUzR2aFcrTUFvR0NDcUdTTTQ5QkFNQ0EwZ0FNRVVDSVFEQgpFdlNybGZUL2k2VGdIWHhWYXhyQUpGMGxuaW9pSUk3N2VFcUFCUVJTNEFJZ1oyRmpRZCtSQitrWmpXeFVOZG0vCmwzUWpveStDZXlNYkJLcnVuTHg1TjBNPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    apiServerURL: https://10.53.47.173
    bootstrap:
      agentNamespace: cattle-fleet-local-system
      branch: master
      namespace: fleet-local
      paths: ""
      repo: ""
      secret: ""
    githubURLPrefix: ""
    ignoreClusterRegistrationLabels: false
    systemDefaultRegistry: ""
    webhookReceiverURL: ""
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: fleet
    meta.helm.sh/release-namespace: cattle-fleet-system
  creationTimestamp: "2024-10-22T09:10:42Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: fleet-controller
  namespace: cattle-fleet-system
  resourceVersion: "1728677"
  uid: e903b5dd-c645-4fb0-8880-f0b8eb28ab69

secret:

secret -n fleet-local local-kubeconfig -oyaml
apiVersion: v1
data:
  apiServerCA: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJlRENDQVIrZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWtNU0l3SUFZRFZRUUREQmx5YTJVeUxYTmwKY25abGNpMWpZVUF4TnpJNU5UZzRNRGcyTUI0WERUSTBNVEF5TWpBNU1EZ3dObG9YRFRNME1UQXlNREE1TURndwpObG93SkRFaU1DQUdBMVVFQXd3WmNtdGxNaTF6WlhKMlpYSXRZMkZBTVRjeU9UVTRPREE0TmpCWk1CTUdCeXFHClNNNDlBZ0VHQ0NxR1NNNDlBd0VIQTBJQUJORk5oN2ZKWGdwY2trK1d2QnBGT01UVjJrRlZmVjRVdXRkZnF3dk0Kb2JFdHcvK0RaQ1NJU3Jsc1ZxeHQ0di82S2lOSDZkcnVYbDNqVDdkV1BwdlIrUDJqUWpCQU1BNEdBMVVkRHdFQgovd1FFQXdJQ3BEQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01CMEdBMVVkRGdRV0JCUVQ0a3c2YU1sVUVnWmpiUmpuCmJZc2Z4bnJCMXpBS0JnZ3Foa2pPUFFRREFnTkhBREJFQWlBS3FvNmtvdGlST3dvUTk0aVRTSnRWYnpIS0xFMHkKZFUwRFRGa2RFZkVreGdJZ1pDc1MwSWkvZmtLNHFiZFJrUVk5RU93QkFVYjVUMVR4dm5pMlNOYXVDU0E9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  apiServerURL: aHR0cHM6Ly8xMC41My4wLjE6NDQz
  token: dS1tbzc3M3l0dHQ0Om12OHY3bmdieDVtcjVkOWpnMnNoc3Ridzk1a3cyNm1ud2hxNW5xY21uNzJobGxkZjY2dHg0eg==
  value: YXBpVmVyc2lvbjogdjEKY2x1c3RlcnM6Ci0gY2x1c3RlcjoKICAgIGNlcnRpZmljYXRlLWF1dGhvcml0eS1kYXRhOiBMUzB0TFMxQ1JVZEpUaUJEUlZKVVNVWkpRMEZVUlMwdExTMHRDazFKU1VKMlZFTkRRVmRQWjBGM1NVSkJaMGxDUVVSQlMwSm5aM0ZvYTJwUFVGRlJSRUZxUWtkTlVuZDNSMmRaUkZaUlVVdEZlRTVyWlZjMWFHSlhiR29LWWtkc2VtUkhWblZhV0VsMFlqTktiazFUV1hkS1FWbEVWbEZSUkVSQ01XdGxWelZvWWxkc2FtSkhiSHBrUjFaMVdsaEpkRmt5UmtGTlZHTjVUMVJWTkFwUFJFbDVUVVJCWlVaM01IbE9SRVYzVFdwSmQwOVVSWGROYWtKaFJuY3dlazVFUlhkTmFrRjNUMVJGZDAxcVFtRk5SVmw0U0VSQllVSm5UbFpDUVc5VUNrVXlValZpYlVaMFlWZE9jMkZZVGpCYVZ6VnNZMmt4ZG1OdFkzaEtha0ZyUW1kT1ZrSkJUVTFJVjFJMVltMUdkR0ZYVG5OaFdFNHdXbGMxYkdOcE1Xb0tXVlZCZUU1NlNUVk9WR2MwVFdwSmQwMUdhM2RGZDFsSVMyOWFTWHBxTUVOQlVWbEpTMjlhU1hwcU1FUkJVV05FVVdkQlJYcFFkbUZLWTAxQ1kzUnRjZ292VFRkVmRGWklPVmxTY21WTU0wWjJkRmhGV25aWE9HOVRVUzlFVkhkdk5EWjFXbXhuU1c1d1JUaENiV001YjNCT2FXOTVaamhGYTIxU2NHRmxXRkkzQ25WdWQxVm1MekpNUkdGT1EwMUZRWGRFWjFsRVZsSXdVRUZSU0M5Q1FWRkVRV2RMYTAxQk9FZEJNVlZrUlhkRlFpOTNVVVpOUVUxQ1FXWTRkMGhSV1VRS1ZsSXdUMEpDV1VWR1JEUkZNV3dyS3pkV1ZXVk9NRWRxU0NzMVdWcGFVelIyYUZjclRVRnZSME5EY1VkVFRUUTVRa0ZOUTBFd1owRk5SVlZEU1ZGRVFncEZkbE55YkdaVUwyazJWR2RJV0hoV1lYaHlRVXBHTUd4dWFXOXBTVWszTjJWRmNVRkNVVkpUTkVGSloxb3lSbXBSWkN0U1FpdHJXbXBYZUZWT1pHMHZDbXd6VVdwdmVTdERaWGxOWWtKTGNuVnVUSGcxVGpCTlBRb3RMUzB0TFVWT1JDQkRSVkpVU1VaSlEwRlVSUzB0TFMwdAogICAgc2VydmVyOiBodHRwczovLzEwLjUzLjQ3LjE3My9rOHMvY2x1c3RlcnMvbG9jYWwKICBuYW1lOiBjbHVzdGVyCmNvbnRleHRzOgotIGNvbnRleHQ6CiAgICBjbHVzdGVyOiBjbHVzdGVyCiAgICB1c2VyOiB1c2VyCiAgbmFtZTogZGVmYXVsdApjdXJyZW50LWNvbnRleHQ6IGRlZmF1bHQKa2luZDogQ29uZmlnCnByZWZlcmVuY2VzOiB7fQp1c2VyczoKLSBuYW1lOiB1c2VyCiAgdXNlcjoKICAgIHRva2VuOiB1LW1vNzczeXR0dDQ6bXY4djduZ2J4NW1yNWQ5amcyc2hzdGJ3OTVrdzI2bW53aHE1bnFjbW43MmhsbGRmNjZ0eDR6Cg==
kind: Secret
metadata:
  creationTimestamp: "2024-10-22T09:10:21Z"
  labels:
    cluster.x-k8s.io/cluster-name: local
  name: local-kubeconfig
  namespace: fleet-local
  ownerReferences:
  - apiVersion: provisioning.cattle.io/v1
    kind: Cluster
    name: local
    uid: f03854eb-90fe-4fac-8ccc-cb292dc9a583
  resourceVersion: "2547"
  uid: 422a9004-ab22-4013-811b-2ab94d2c37fd
type: Opaque

The cluster.fleet object:

get cluster.fleet -n fleet-local local -oyaml
apiVersion: fleet.cattle.io/v1alpha1
kind: Cluster
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4xSTW/bMAz9KwPPTresX4mBHYquGIoBPbS7FT0wEm1rkSlBopIagf/7ILtJjbYrcpPE9x75nriDlgQ1CkK5A2R2gmIcx3x1q7+kJJKcBONOFIpYOjHuq9FQQmWJZKZsikIBiv+C3ZYpzOrNGkrwwW1MNI4N1xPIZl58+W1Y/7g+Uo2xJSjBOoX2KHD0qOgw9MjrC1CBBrd/TEtRsPVQcrK2AIsrskMGLTLW1BLLRPjF9Uyb6C12b+f5lPMGe6SVBmOT5/+OC1wul0tcnNN8XlWL04vzM7qYK1KneLmYn10uq1V1CcWYtabwKgIlNBg2NETcv2v9SU7Rkxr2oyaWq6oybKTLD+w0Te8+UEUhkP6ZguH6QTWkkzVc39bsDs83z6RSzh3Kxz2HODfOeYtqbp59oBjHPXzcwZq6/VCTTIZpcmaeAooLUMItQwEbtIkyESQkgqf+qS9gS6ZuBMp5/9T3xejkbuJ4lJ1NjM9iF4VaKGCdVnTtuDL1A6lAsg9tlgtqKHyAuvskT0FJ8ZBoPliM8kDE+w2c/kamaIomkL4n1N0vI/fkXYTyWwEvOwjlri8gvCsHii4FRdcusQwtp0oDpDUxGq7Hs9OmMqSHCzt5RbngG+SXSjg8J16z2/Jw3qKRK+/tyM8+U9ti6D7supfo+/5fAAAA//+aI8sBhQQAAA
    objectset.rio.cattle.io/id: fleet-cluster
    objectset.rio.cattle.io/owner-gvk: provisioning.cattle.io/v1, Kind=Cluster
    objectset.rio.cattle.io/owner-name: local
    objectset.rio.cattle.io/owner-namespace: fleet-local
  creationTimestamp: "2024-10-22T09:10:21Z"
  generation: 9
  labels:
    management.cattle.io/cluster-display-name: local
    management.cattle.io/cluster-name: local
    name: local
    objectset.rio.cattle.io/hash: f2a8a9999a85e11ff83654e61cec3a781479fbf7
    provider.cattle.io: harvester
  name: local
  namespace: fleet-local
  resourceVersion: "2289565"
  uid: 299cf0b3-2b6b-433f-b790-4b4754d3fb31
spec:
  agentAffinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: fleet.cattle.io/agent
            operator: In
            values:
            - "true"
        weight: 1
  agentNamespace: cattle-fleet-local-system
  clientID: qqgxq26f8nzkc5xnvdxtt64x8jfjjh9545vwv4t7wnpb99vkjz89ft
  kubeConfigSecret: local-kubeconfig
  kubeConfigSecretNamespace: fleet-local
  redeployAgentGeneration: 7
status:
  agent:
    lastSeen: "2024-11-08T11:23:50Z"
    namespace: cattle-fleet-local-system
  agentAffinityHash: f50425c0999a8e18c2d104cdb8cb063762763f232f538b5a7c8bdb61
  agentDeployedGeneration: 7
  agentMigrated: true
  agentNamespaceMigrated: true
  agentTLSMode: strict
  apiServerCAHash: 302980e70d0e2817c3f94bfafa6e419be2249fc11d128da50054b390
  apiServerURL: https://10.53.0.1:443
  cattleNamespaceMigrated: true
  conditions:
  - lastUpdateTime: "2024-10-22T09:11:23Z"
    status: "True"
    type: Processed
  - lastUpdateTime: "2024-11-08T11:23:29Z"
    status: "True"
    type: Imported
  - lastUpdateTime: "2024-10-22T09:11:23Z"
    status: "True"
    type: Reconciled
  - lastUpdateTime: "2024-11-08T10:52:30Z"
    status: "True"
    type: Ready
  desiredReadyGitRepos: 0
  display:
    readyBundles: 7/7
  garbageCollectionInterval: 0s
  namespace: cluster-fleet-local-local-1a3d67d0a899
  readyGitRepos: 0
  resourceCounts:
    desiredReady: 0
    missing: 0
    modified: 0
    notReady: 0
    orphaned: 0
    ready: 0
    unknown: 0
    waitApplied: 0
  summary:
    desiredReady: 7
    ready: 7

And, if we kill the fleet-controller POD, it will always re-deploy the fleet-agent with below debug information

kk logs -n cattle-fleet-system fleet-controller-78f8b6677c-hvftb -c fleet-agentmanagement
I1108 11:23:28.926600       1 leaderelection.go:250] attempting to acquire leader lease cattle-fleet-system/fleet-agentmanagement-lock...
I1108 11:23:28.934031       1 leaderelection.go:260] successfully acquired lease cattle-fleet-system/fleet-agentmanagement-lock

// debug via https://github.com/rancher/fleet/blob/1cddbbff1c2cd71c9b9011c3738754e5b4c8fa89/internal/cmd/controller/agentmanagement/controllers/config/controller.go#L24

time="2024-11-08T11:23:28Z" level=info msg="When Register, cattle-fleet-system/fleet-controller the Lookup result: APIServerURL:https://10.53.47.173 AgentTLSMode:strict"

time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=Bundle controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=ClusterRegistrationToken controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=ClusterRegistration controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=Role controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=GitRepo controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting /v1, Kind=Namespace controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"

// debug via https://github.com/rancher/fleet/blob/1cddbbff1c2cd71c9b9011c3738754e5b4c8fa89/internal/cmd/controller/agentmanagement/controllers/config/controller.go#L42

time="2024-11-08T11:23:29Z" level=info msg="When onchange, reloadConfig, cattle-fleet-system/fleet-controller the ReadConfig result: APIServerURL:https://10.53.47.173 AgentTLSMode:strict"

time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=ClusterGroup controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=BundleDeployment controller"
time="2024-11-08T11:23:29Z" level=info msg="Update agent bundle for cluster fleet-local/local"
time="2024-11-08T11:23:29Z" level=info msg="Starting fleet.cattle.io/v1alpha1, Kind=Cluster controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller"
time="2024-11-08T11:23:29Z" level=info msg="Starting /v1, Kind=Secret controller"

time="2024-11-08T11:23:29Z" level=info msg="API server config changed, trigger cluster import for cluster fleet-local/local"

// debug via hasConfigChanged

time="2024-11-08T11:23:29Z" level=info msg="detected change: APIServerURL: https://10.53.47.173 https://10.53.0.1:443 equal:false, CAHash: d372882253c64e2862a02eb9022b9ed346b9ca21c13d06969e887c14 302980e70d0e2817c3f94bfafa6e419be2249fc11d128da50054b390 equal:false, AgentTLSMode:strict strict equal:true, garbagechanged: false"

time="2024-11-08T11:23:29Z" level=info msg="Deleted old agent for cluster (fleet-local/local) in namespace cattle-fleet-local-system"
time="2024-11-08T11:23:29Z" level=info msg="Cluster import for 'fleet-local/local'. Deployed new agent"

time="2024-11-08T11:23:31Z" level=info msg="Waiting for service account token key to be populated for secret cluster-fleet-local-local-1a3d67d0a899/request-66zfj-3f1e0ebd-af49-4c31-9f9a-382208c6777a-token"
time="2024-11-08T11:23:31Z" level=info msg="Namespace assigned to cluster 'fleet-local/local' enqueues cluster registration 'fleet-local/request-66zfj'"
time="2024-11-08T11:23:33Z" level=info msg="Cluster registration request 'fleet-local/request-66zfj' granted, creating cluster, request service account, registration secret"
time="2024-11-08T11:23:33Z" level=info msg="Cluster registration request 'fleet-local/request-66zfj' granted, creating cluster, request service account, registration secret"

Expected Behavior

Because the fleet-agent may deploy/update managedchart at any time, it should only be re-deployed in necessary cases.

The onChange needs to check the none-fallback case.

https://github.com/rancher/fleet/blob/1cddbbff1c2cd71c9b9011c3738754e5b4c8fa89/internal/cmd/controller/agentmanagement/controllers/cluster/import.go#L98

func (i *importHandler) onConfig(config *config.Config) error {

Steps To Reproduce

This is observed in the Harvester upgrade test

https://github.com/harvester/harvester/issues/6851

When the embeded Rancher is upgraded and many conditions are checked, Harvester starts to upgrade the ManagedCharts, but randomly, the fleet-agent is re-deployed, it may cause some ManagedChart in middle-state, and the new fleet-agent does an rollback upon them, that causes other issues. For more details, please refer: https://github.com/harvester/harvester/issues/6851#issuecomment-2464827784

Environment

- Architecture: 
- Fleet Version: Rancher v2.9.2 + fleet v0.10.2;  Harvester v1.4.0;  The `local` cluster is managed by `Rancher` and `Fleet`.
- Cluster:
  - Provider:
  - Options:
  - Kubernetes Version:

Logs

No response

Anything else?

No response

w13915984028 avatar Nov 08 '24 13:11 w13915984028

note: In the configmap, the apiServerURL is https://10.53.47.173, it is Rancher service IP in this cluster; and we also observed, in upgrade process, this value becomes empty first, then revert to https://10.53.47.173

configmap:

    name: fleet-controller
    namespace: cattle-fleet-system
    apiServerURL: https://10.53.244.156

In the secret: the apiServerURL is https://10.53.0.1:443, the default kubernetes service IP.

secret:
  name: local-kubeconfig
  namespace: fleet-local
apiServerURL: aHR0cHM6Ly8xMC41My4wLjE6NDQz


echo aHR0cHM6Ly8xMC41My4wLjE6NDQz | base64 -d
https://10.53.0.1:443

default                           kubernetes                                    ClusterIP      10.53.0.1

And, from our debug log, the apiServerCA is alway different.

This means anychange in configmap fleet-controller will cause fleet-controller re-deploy the fleet-agent. Because the hasConfigChanged is always TRUE.

cc @manno

w13915984028 avatar Nov 08 '24 13:11 w13915984028

Pending validation from OP's team (Harvester).

weyfonk avatar Apr 16 '25 06:04 weyfonk

validate report: PASS

On a Harvester v1.6.0-master-head cluster, which has following runtime, which does not include this PR

runtimeversion: v1.32.4+rke2r1
rancherversion: v2.11.2

fleet-version                                 106.1.1+up0.12.3

each time, when fleet-controller pod is killed, the fleet-agent is replaced automatically:

$kubectl get pods -A | grep fleet
cattle-fleet-local-system         fleet-agent-77c65c9d9d-bnprs                             1/1     Running     0              119s
cattle-fleet-system               fleet-controller-d6b688c66-4kdlc                         3/3     Running     0              2m

set fleet image tag to https://github.com/rancher/fleet/releases/tag/v0.13.0-alpha.5 (including this PR) manually:

$kubectl get pods -A | grep fleet
cattle-fleet-local-system         fleet-agent-f58cc6fd4-sp6pf                              1/1     Running     0              4m53s
cattle-fleet-system               fleet-controller-5ff7ffd664-4p7qq                        3/3     Running     0              3m42s

$kubectl get pods -A | grep fleet
cattle-fleet-local-system         fleet-agent-f58cc6fd4-sp6pf                              1/1     Running     0              5m24s
cattle-fleet-system               fleet-controller-5ff7ffd664-wg8lp                        3/3     Running     0              7s

killing fleet-controller pod does not affect fleet-agent pod

Thanks for the fix @weyfonk

w13915984028 avatar Jun 16 '25 08:06 w13915984028

Solved.

note: The PR targets on Rancher v2.12.* and Fleet v0.13.*

w13915984028 avatar Jun 16 '25 08:06 w13915984028