fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Fleet-Agent fails to transition from 'fleet-agent-bootstrap' to 'fleet-agent' Secret after Rancher update

Open LucEast opened this issue 11 months ago • 2 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

After updating Rancher from 2.7.5 to 2.8.5, some imported RKE2 clusters are displayed as offline in the Rancher UI. Upon investigation, the issue seems related to the Fleet-Agent remaining stuck in a "bootstrap" state. Specifically, the Fleet-Agent continues to use the fleet-agent-bootstrap secret and fails to generate the fleet-agent secret. This issue only occurs for certain clusters, while others work as expected.

Expected Behavior

The Fleet-Agent should transition from using the fleet-agent-bootstrap secret to creating and using the fleet-agent secret, completing the registration process.

Steps To Reproduce

  1. Update Rancher Management Server from version 2.7.5 to 2.8.5.

  2. Ensure there are imported downstream clusters (e.g., v1.26.16+rke2r1).

  3. Check the logs of the fleet-agent on an affected cluster:

    kubectl -n cattle-fleet-system logs -l app=fleet-agent
    

    Example error logs:

    time="2025-01-27T07:48:46Z" level=error msg="Failed to register agent: registration failed: cannot create clusterregistration on management cluster for cluster id 'some_random_id': Unauthorized"
    time="2025-01-27T07:49:46Z" level=warning msg="Cannot find fleet-agent secret, running registration"
    time="2025-01-27T07:49:46Z" level=info msg="Creating clusterregistration with id 'some_random_id' for new token"
    
  4. Compare the cattle-fleet-system namespace secrets:

    • Working clusters have a fleet-agent secret.
    • Affected clusters only have a fleet-agent-bootstrap secret.

Environment

- Architecture: x86
- Fleet Version: v0.9.5 
- Cluster:
  - Provider: RKE2
  - Options: 
  - Kubernetes Version: v1.26.16

Logs

<details> <summary>cattle-fleet-system logs</summary>
time="2025-01-27T09:12:49Z" level=error msg="Failed to register agent: registration failed: cannot create clusterregistration on management cluster for cluster id '66t7lwf9r6gpwljrk5swl9hdxgt5bkxc756nvl7hbtd75r2zfrgm9v': Unauthorized"
time="2025-01-27T09:13:49Z" level=warning msg="Cannot find fleet-agent secret, running registration"
time="2025-01-27T09:13:49Z" level=info msg="Creating clusterregistration with id '66t7lwf9r6gpwljrk5swl9hdxgt5bkxc756nvl7hbtd75r2zfrgm9v' for new token"
</details>

<details> <summary>cattle-cluster-agent logs from a cluster that lost connection to rancher</summary>
kubectl -n cattle-system logs deployments/cattle-cluster-agent
Found 2 pods, using pod/cattle-cluster-agent-984568b5-cpsh7
Error: --namespace or env NAMESPACE is required to be set
Usage:
  fleet-agent [flags]

Flags:
      --agent-scope string        An identifier used to scope the agent bundleID names, typically the same as namespace
      --checkin-interval string   How often to post cluster status
      --debug                     Turn on debug logging
      --debug-level int           If debugging is enabled, set klog -v=X
  -h, --help                      help for fleet-agent
      --kubeconfig string         kubeconfig file
      --namespace string          namespace to watch
  -v, --version                   version for fleet-agent

time="2025-01-27T07:12:36Z" level=fatal msg="--namespace or env NAMESPACE is required to be set"
</details>

Anything else?

No response

LucEast avatar Jan 27 '25 09:01 LucEast

Same here, the Init container is stuck in registration phase.

time="2025-03-01T07:35:01Z" level=warning msg="Cannot find fleet-agent secret, running registration"
time="2025-03-01T07:35:01Z" level=info msg="Creating clusterregistration with id '589kvb5962bmls2zgrbfkfd7jjmw2tb9fnbvb7vbxnr88s9bqtdb2c' for new token"                                                               
time="2025-03-01T07:35:01Z" level=error msg="Failed to register agent: registration failed: cannot create clusterregistration on management cluster for cluster id '589kvb5962bmls2zgrbfkfd7jjmw2tb9fnbvb7vbxnr88s9bqtdb2c': Unauthorized"

Here's a bit of debug information when running it manually:

$ fleetagent --debug --debug-level 9 register
I0301 07:33:23.769769     158 merged_client_builder.go:121] Using in-cluster configuration
2025-03-01T07:33:23Z    INFO    setup   starting registration on upstream cluster       {"namespace": "cattle-fleet-local-system"}
I0301 07:33:23.770730     158 round_trippers.go:466] curl -v -XGET  -H "User-Agent: fleetagent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" 'https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/secrets/fleet-agent'
I0301 07:33:23.771490     158 round_trippers.go:510] HTTP Trace: Dial to tcp:10.43.0.1:443 succeed
I0301 07:33:23.779418     158 round_trippers.go:553] GET https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/secrets/fleet-agent 404 Not Found in 8 milliseconds
I0301 07:33:23.779476     158 round_trippers.go:570] HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 3 ms ServerProcessing 4 ms Duration 8 ms
I0301 07:33:23.779496     158 round_trippers.go:577] Response Headers:
I0301 07:33:23.779520     158 round_trippers.go:580]     Audit-Id: 81920e58-4acb-4484-8fb1-14fd212f8e1d
I0301 07:33:23.779535     158 round_trippers.go:580]     Cache-Control: no-cache, private
I0301 07:33:23.779545     158 round_trippers.go:580]     Content-Type: application/json
I0301 07:33:23.779578     158 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 761e5877-c905-4b0f-bef4-fa21351f0054
I0301 07:33:23.779590     158 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: af6a8b8a-b244-402d-ba08-54fb0cfcf0d0
I0301 07:33:23.779599     158 round_trippers.go:580]     Content-Length: 196
I0301 07:33:23.779606     158 round_trippers.go:580]     Date: Sat, 01 Mar 2025 07:33:23 GMT
I0301 07:33:23.779656     158 request.go:1351] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"fleet-agent\" not found","reason":"NotFound","details":{"name":"fleet-agent","kind":"secrets"},"code":404}
WARN[0000] Cannot find fleet-agent secret, running registration
I0301 07:33:23.780034     158 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: fleetagent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/secrets/fleet-agent-bootstrap'
I0301 07:33:23.783016     158 round_trippers.go:553] GET https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/secrets/fleet-agent-bootstrap 200 OK in 2 milliseconds
I0301 07:33:23.783069     158 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 2 ms Duration 2 ms
I0301 07:33:23.783084     158 round_trippers.go:577] Response Headers:
I0301 07:33:23.783095     158 round_trippers.go:580]     Content-Length: 3915
I0301 07:33:23.783106     158 round_trippers.go:580]     Date: Sat, 01 Mar 2025 07:33:23 GMT
I0301 07:33:23.783112     158 round_trippers.go:580]     Audit-Id: 870f9c29-62c2-435d-8be5-58e712ea19cc
I0301 07:33:23.783117     158 round_trippers.go:580]     Cache-Control: no-cache, private
I0301 07:33:23.783123     158 round_trippers.go:580]     Content-Type: application/json
I0301 07:33:23.783132     158 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 761e5877-c905-4b0f-bef4-fa21351f0054
I0301 07:33:23.783138     158 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: af6a8b8a-b244-402d-ba08-54fb0cfcf0d0
I0301 07:33:23.783258     158 request.go:1351] Response Body: {"kind":"Secret","apiVersion":"v1","metadata":{"name":"fleet-agent-bootstrap","namespace":"cattle-fleet-local-system","uid":"7795d125-ff8f-4f53-8fa7-3c5622832ad7","resourceVersion":"6250553","creationTimestamp":"2025-02-25T16:34:52Z","labels":{"objectset.rio.cattle.io/hash":"362023f752e7f1989d8b652e029bd2c658ae7c44"},"annotations":{"objectset.rio.cattle.io/applied":"H4sIAAAAAAAA/3yQS3PaMBSF/8tdAxXm7ZksGihQ1fYUGz/QTpYvibH8GOumQDL57x3D9LFJlpqje75zzhtkkiTYbyCbPMD2F7bLr2CDEzByguHSDzO+zx9XfsTDIBR8x9ahf9MYLYshD0JeYOGtdtH3q2DrURDyR8H0Enr/DEPfARvk1mdq606d6/ziLsfMvY7Pzunb1FvtXqEHSr8YwtaTJZpGKgQbRHnRItlRulmcDvH5AXpgroaw9PEpN9RKyuvq/4ODtWbZ5qKdWBgRR8yJPZMlHhMJf3USb6JGvk53D50R1QVWHSPRP+SGbw+W/inDZh8lDXfDSRMkHk8T/xwXHvcKbdRWn9zNZR2FfuBqQW7Cg2goWnjvQYkk/65YVTXdgpnuWacnVGSQBm1eD5Qk0jjI6y95BjYcNSL15RNW1E/rmrpKTf/+qX8Xda2k7t9bdyjV4s18n5doSJYN2NWL1j3QMkX9KfJZmmewYTS1mDU6ziYWzo7DxXyRzdPpxEJmLdLMUtPJXOJMjccdrZIlfpQT7vKf6T9J/f47AAD//xPZ5xRkAgAA","objectset.rio.cattle.io/id":"fleet-agent-bootstrap-cattle-fleet-local-system"},"managedFields":[{"manager":"fleetcontroller","operation":"Update","apiVersion":"v1","time":"2025-02-25T16:34:52Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:apiServerCA":{},"f:apiServerURL":{},"f:clusterNamespace":{},"f:systemRegistrationNamespace":{},"f:token":{}},"f:metadata":{"f:annotations":{".":{},"f:objectset.rio.cattle.io/applied":{},"f:objectset.rio.cattle.io/id":{}},"f:labels":{".":{},"f:objectset.rio.cattle.io/hash":{}}},"f:type":{}}}]},"data":{"apiServerCA":"LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJkekNDQVIyZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWpNU0V3SHdZRFZRUUREQmhyTTNNdGMyVnkKZG1WeUxXTmhRREUzTXprek9EWXdNakF3SGhjTk1qVXdNakV5TVRnME56QXdXaGNOTXpVd01qRXdNVGcwTnpBdwpXakFqTVNFd0h3WURWUVFEREJock0zTXRjMlZ5ZG1WeUxXTmhRREUzTXprek9EWXdNakF3V1RBVEJnY3Foa2pPClBRSUJCZ2dxaGtqT1BRTUJCd05DQUFRVThqQ3ErVlA2eHVITERsQ2hvZFdwQ3grODVMaUxCS083L0NJMDh6SE0KMGgzbHRSYXNZeHNIUGFqUElzTVJaUHRRbG43bW4xeExFVjBzc2p1NkwwSVBvMEl3UURBT0JnTlZIUThCQWY4RQpCQU1DQXFRd0R3WURWUjBUQVFIL0JBVXdBd0VCL3pBZEJnTlZIUTRFRmdRVUEyNDlYN0hJcFE2ZlUwYUdmTmQ4CmpkNTd5UzB3Q2dZSUtvWkl6ajBFQXdJRFNBQXdSUUloQU91UjNvZzVIdndCKzF0WWZBSm9jU2hKTmhHaWRpbUMKNGk4dStFQXZRU1VEQWlCQUlBWkMvVVJSb2VDeFN0SVNWVUV4bWpBWERER095ZVBNa2orTUwxMkFGZz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K","apiServerURL":"aHR0cHM6Ly8xMC40My4wLjE6NDQz","clusterNamespace":"ZmxlZXQtbG9jYWw=","systemRegistrationNamespace":"Y2F0dGxlLWZsZWV0LWNsdXN0ZXJzLXN5c3RlbQ==","token":"ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklscHljMGxFVURSMlZtMXJSV1ZrYW1kbmFXSnJlbXRQVVY4Mll6aFRjVGhaWTA1allVSlZkVFkyUzBVaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUptYkdWbGRDMXNiMk5oYkNJc0ltdDFZbVZ5Ym1WMFpYTXVhVzh2YzJWeWRtbGpaV0ZqWTI5MWJuUXZjMlZqY21WMExtNWhiV1VpT2lKcGJYQnZjblF0ZEc5clpXNHRiRzlqWVd3dE9HWXhaalV6TXpRdFpUTTNNQzAwTnpBekxXSXlNakF0WVdFd01UUTVaRFk0TVRSbExYUnZhMlZ1SWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXpaWEoyYVdObExXRmpZMjkxYm5RdWJtRnRaU0k2SW1sdGNHOXlkQzEwYjJ0bGJpMXNiMk5oYkMwNFpqRm1OVE16TkMxbE16Y3dMVFEzTURNdFlqSXlNQzFoWVRBeE5EbGtOamd4TkdVaUxDSnJkV0psY201bGRHVnpMbWx2TDNObGNuWnBZMlZoWTJOdmRXNTBMM05sY25acFkyVXRZV05qYjNWdWRDNTFhV1FpT2lKbU0yVXdaalU0Tmkxak16RXhMVFE1TURZdE9UVXpZUzFtWlRJMk5tVmtaVGMyWVRRaUxDSnpkV0lpT2lKemVYTjBaVzA2YzJWeWRtbGpaV0ZqWTI5MWJuUTZabXhsWlhRdGJHOWpZV3c2YVcxd2IzSjBMWFJ2YTJWdUxXeHZZMkZzTFRobU1XWTFNek0wTFdVek56QXRORGN3TXkxaU1qSXdMV0ZoTURFME9XUTJPREUwWlNKOS5DWkx5NWFtRE8tdnpFRU5FVldRMXJTdXV2ajB4UExPUms1WS1rX1ZoSEtzY0FEMUJOcmV0LXZGQkhzYzl2ZU1KV3ZsbTZDZnU4ZTdEb3FJQjJkU0RhV2tnc0FSMjhydkRVMW9WbUo3cEtQS25YaDhxM0lTNDlLalBDSXowazF1dGhVdktlVmZBZFRNOFJfeGRNT2lGck5YU1FHMXVFZ3lNMzdjTy02VVBSVF9vVGQ1RG54UnQ5dFdSY2NMdHFSZ1p0YkZFeTNMOXZFaXFfM1J0NEpoU2RHSFNxUHFWbVdEQ1JYLUJvYW44Mk45OG9DOUFYb21DaVY4X3k4YlVraHgxV3h5c2JkOWxFM3FuQkRpcmJtQ1hzSE5scG1iRTlETDQyaHBmS3B3NWs1cTc2NUpSM2FiOHpZSVN6eXplVWhnT3g2SDNVSm1LTFRwamxwcWFVV3dyX1E="},"type":"Opaque"}
I0301 07:33:23.783801     158 round_trippers.go:466] curl -v -XGET  -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" -H "User-Agent: fleetagent/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/configmaps/fleet-agent'
I0301 07:33:23.787477     158 round_trippers.go:553] GET https://10.43.0.1:443/api/v1/namespaces/cattle-fleet-local-system/configmaps/fleet-agent 200 OK in 3 milliseconds
I0301 07:33:23.787541     158 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 3 ms Duration 3 ms
I0301 07:33:23.787565     158 round_trippers.go:577] Response Headers:
I0301 07:33:23.787589     158 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: af6a8b8a-b244-402d-ba08-54fb0cfcf0d0
I0301 07:33:23.787608     158 round_trippers.go:580]     Content-Length: 1450
I0301 07:33:23.787630     158 round_trippers.go:580]     Date: Sat, 01 Mar 2025 07:33:23 GMT
I0301 07:33:23.787646     158 round_trippers.go:580]     Audit-Id: 10418836-3e6a-42cd-8002-9dda32d22697
I0301 07:33:23.787665     158 round_trippers.go:580]     Cache-Control: no-cache, private
I0301 07:33:23.787682     158 round_trippers.go:580]     Content-Type: application/json
I0301 07:33:23.787702     158 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 761e5877-c905-4b0f-bef4-fa21351f0054
I0301 07:33:23.787794     158 request.go:1351] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"fleet-agent","namespace":"cattle-fleet-local-system","uid":"bf3dffe9-71fb-4ed1-8a5e-4eec87e1bf56","resourceVersion":"214168","creationTimestamp":"2025-02-13T08:40:10Z","labels":{"objectset.rio.cattle.io/hash":"362023f752e7f1989d8b652e029bd2c658ae7c44"},"annotations":{"objectset.rio.cattle.io/applied":"H4sIAAAAAAAA/3yPvXKsMAyF30U1cLne5c9tqvQp3QgjFifGZixlZzLMvnvGkBRptpTOGX2fdphQEPQONobZ3UDDbgBvFORlIfvhwmsQSnf0BrSBmg0UBjyO5NmA3g0EXOnIfLS5VRjYUry7iVIFjwJWEvxlYAhRUFwMnMc4vpMVJqmSi5VFEU+Vi//cBBpmTyTlYVKOMQpLwq08S+UZHsSSv1hozSib6Dj+5lZiwXUDHT69L358nyEX5AU0XFpVq8vcNYq6+f/QD1M/to2iWg3jpGzb9EidvV4zLT/+1xPOJW9oc/LE9fEdAAD//6hbMZF4AQAA","objectset.rio.cattle.io/id":"fleet-agent-bootstrap-cattle-fleet-local-system"},"managedFields":[{"manager":"fleetcontroller","operation":"Update","apiVersion":"v1","time":"2025-02-13T08:40:10Z","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:config":{}},"f:metadata":{"f:annotations":{".":{},"f:objectset.rio.cattle.io/applied":{},"f:objectset.rio.cattle.io/id":{}},"f:labels":{".":{},"f:objectset.rio.cattle.io/hash":{}}}}}]},"data":{"config":"{\"agentCheckinInterval\":\"0s\",\"labels\":{\"name\":\"local\",\"provider.cattle.io\":\"k3s\"},\"clientID\":\"589kvb5962bmls2zgrbfkfd7jjmw2tb9fnbvb7vbxnr88s9bqtdb2c\",\"bootstrap\":{},\"agentTLSMode\":\"strict\",\"gitClientTimeout\":\"0s\",\"garbageCollectionInterval\":\"15m0s\",\"agentWorkers\":{}}"}}
INFO[0000] Creating clusterregistration with id '589kvb5962bmls2zgrbfkfd7jjmw2tb9fnbvb7vbxnr88s9bqtdb2c' for new token
I0301 07:33:23.790161     158 request.go:1351] Request Body: {"kind":"ClusterRegistration","apiVersion":"fleet.cattle.io/v1alpha1","metadata":{"generateName":"request-","namespace":"fleet-local","creationTimestamp":null},"spec":{"clientID":"589kvb5962bmls2zgrbfkfd7jjmw2tb9fnbvb7vbxnr88s9bqtdb2c","clientRandom":"hmkbgq9gfkztn9m4jzfwnh9whq6zj4k4fnqbgv5x2xm2bf8vvrkng7","clusterLabels":{"fleet.cattle.io/created-by-agent-pod":"fleet-agent-0","name":"local","provider.cattle.io":"k3s"}},"status":{}}
I0301 07:33:23.790283     158 round_trippers.go:466] curl -v -XPOST  -H "Content-Type: application/json" -H "User-Agent: fleetagent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" 'https://10.43.0.1:443/apis/fleet.cattle.io/v1alpha1/namespaces/fleet-local/clusterregistrations'
I0301 07:33:23.794412     158 round_trippers.go:553] POST https://10.43.0.1:443/apis/fleet.cattle.io/v1alpha1/namespaces/fleet-local/clusterregistrations 401 Unauthorized in 4 milliseconds
I0301 07:33:23.794444     158 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 3 ms Duration 4 ms
I0301 07:33:23.794461     158 round_trippers.go:577] Response Headers:
I0301 07:33:23.794476     158 round_trippers.go:580]     Date: Sat, 01 Mar 2025 07:33:23 GMT
I0301 07:33:23.794492     158 round_trippers.go:580]     Audit-Id: a21f3f98-a89a-4ba1-add1-2c710d777f21
I0301 07:33:23.794505     158 round_trippers.go:580]     Cache-Control: no-cache, private
I0301 07:33:23.794513     158 round_trippers.go:580]     Content-Type: application/json
I0301 07:33:23.794525     158 round_trippers.go:580]     Content-Length: 129
I0301 07:33:23.794566     158 request.go:1351] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

rafamiga avatar Mar 01 '25 07:03 rafamiga

It might be related to https://github.com/rancher/rancher/issues/36117. But there's no real solution there and the description how it was resolved is very vauge. Anyway, for clarification:

kubectl auth can-i get secret --as=system:serviceaccount:cattle-fleet-local-system:fleet-agent -n cattle-fleet-local-system
yes

rafamiga avatar Mar 01 '25 08:03 rafamiga

Can you reproduce this with a newer Rancher version, e.g. 2.11.x? This may help understand which resources need to be created and when.

weyfonk avatar Jul 16 '25 13:07 weyfonk

Closing this issue in the absence of feedback.

weyfonk avatar Jul 30 '25 07:07 weyfonk