Agent Pods not reloading renamed or new PolicyID
While working on https://github.com/elastic/cloud-on-k8s/issues/7290 I noticed that renaming a policy ID does not trigger a "restart" of the Agent Pods. The FLEET_ENROLLMENT_TOKEN is provided as an environment variable, and therefore requires a new Pod to be created to read the new value.
apiVersion: v1
kind: Secret
metadata:
creationTimestamp: "2024-10-18T12:42:54Z"
labels:
agent.k8s.elastic.co/name: elastic-agent
common.k8s.elastic.co/type: agent
eck.k8s.elastic.co/credentials: "true"
name: elastic-agent-agent-envvars
namespace: elastic
stringData:
FLEET_ENROLLMENT_TOKEN: REDACTED // This is going to be updated correctly by the Agent controller
type: Opaque
apiVersion: v1
kind: Pod
metadata:
annotations:
agent.k8s.elastic.co/config-hash: "803455129"
openshift.io/scc: privileged
creationTimestamp: "2024-10-18T12:42:54Z"
generateName: elastic-agent-agent-
labels:
agent.k8s.elastic.co/name: elastic-agent
agent.k8s.elastic.co/version: 8.15.0
common.k8s.elastic.co/type: agent
controller-revision-hash: 6ccd744885
pod-template-generation: "1"
name: elastic-agent-agent-hjn5b
namespace: elastic
spec:
containers:
- name: FLEET_ENROLLMENT_TOKEN
valueFrom:
secretKeyRef:
key: FLEET_ENROLLMENT_TOKEN // Not reloaded without a restart
name: elastic-agent-agent-envvars
optional: false
Note that one other problem is that the "old" policies are never deleted and the related tokens remain valid.
EDIT: The Agent itself is now (8.18.0+) fixed and does reenroll if the token changes: https://github.com/elastic/elastic-agent/pull/6568
I think this issue is more complicated than it first seems: recreating the pod is not sufficient if it's run in a stateful manner (like as part of a daemon-set which uses host path mounts). As far as I can tell the Agent prefers to use its persistent state and old policy, ignoring the changed env token value.
While this may very well be a separate issue; from a "UX" perspective they're closely related. That is: changing the PolicyID takes no effect.
I could be missing some crucial Agent knowledge, but I think it might be a limitation with the Agent itself. Looking at the docs and the undocumented help of the container command I see no clear way to force a re-enrollment via the CLI.