cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

Agent Pods not reloading renamed or new PolicyID

Open barkbay opened this issue 1 year ago • 1 comments

While working on https://github.com/elastic/cloud-on-k8s/issues/7290 I noticed that renaming a policy ID does not trigger a "restart" of the Agent Pods. The FLEET_ENROLLMENT_TOKEN is provided as an environment variable, and therefore requires a new Pod to be created to read the new value.

apiVersion: v1
kind: Secret
metadata:
  creationTimestamp: "2024-10-18T12:42:54Z"
  labels:
    agent.k8s.elastic.co/name: elastic-agent
    common.k8s.elastic.co/type: agent
    eck.k8s.elastic.co/credentials: "true"
  name: elastic-agent-agent-envvars
  namespace: elastic
stringData:
  FLEET_ENROLLMENT_TOKEN: REDACTED // This is going to be updated correctly by the Agent controller
type: Opaque
apiVersion: v1
kind: Pod
metadata:
  annotations:
    agent.k8s.elastic.co/config-hash: "803455129"
    openshift.io/scc: privileged
  creationTimestamp: "2024-10-18T12:42:54Z"
  generateName: elastic-agent-agent-
  labels:
    agent.k8s.elastic.co/name: elastic-agent
    agent.k8s.elastic.co/version: 8.15.0
    common.k8s.elastic.co/type: agent
    controller-revision-hash: 6ccd744885
    pod-template-generation: "1"
  name: elastic-agent-agent-hjn5b
  namespace: elastic
spec:
  containers:
    - name: FLEET_ENROLLMENT_TOKEN
      valueFrom:
        secretKeyRef:
          key: FLEET_ENROLLMENT_TOKEN // Not reloaded without a restart
          name: elastic-agent-agent-envvars
          optional: false

Note that one other problem is that the "old" policies are never deleted and the related tokens remain valid.

barkbay avatar Oct 21 '24 06:10 barkbay

EDIT: The Agent itself is now (8.18.0+) fixed and does reenroll if the token changes: https://github.com/elastic/elastic-agent/pull/6568


I think this issue is more complicated than it first seems: recreating the pod is not sufficient if it's run in a stateful manner (like as part of a daemon-set which uses host path mounts). As far as I can tell the Agent prefers to use its persistent state and old policy, ignoring the changed env token value.

While this may very well be a separate issue; from a "UX" perspective they're closely related. That is: changing the PolicyID takes no effect.

I could be missing some crucial Agent knowledge, but I think it might be a limitation with the Agent itself. Looking at the docs and the undocumented help of the container command I see no clear way to force a re-enrollment via the CLI.

lpeter91 avatar Jan 16 '25 16:01 lpeter91