consul-k8s Orphaned ACL tokens

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

Consul is not cleaning up ACL tokens when the k8s node is suddenly shutdown without draining or sending sigterm signal.

Reproduction Steps

Using manageSystemACLs is required. Force destroy any k8s node that is holding the consul client pod. You can see that pod is not able to perform consul logout command in pre-stop and in that case, is leaving ACL token inside consul. We will also see an unhealthy node in the consul catalog, but that is not so big of the problem, as after 72h it will be cleaned up. The same is true for injected pods, they will also leave some ACL entries when the node hosting them (and the local consul agent) is shut down in an unexpected manner. In injector logs, you will not see any information about revoking ACL for them.

After running like that on our very unstable test cluster we observe that we have ~4000 orphaned ACL entries after a few weeks of testing.

In our case, we use external servers, but it should work the same in the local server setup. Simplified values.yaml:

---
global:
  image: "hashicorp/consul:1.12.2"
  logLevel: "info"
  datacenter: qa
  imageEnvoy: "envoyproxy/envoy-alpine:v1.20.3"
  logJSON: true
  tls:
    enabled: true
    enableAutoEncrypt: true
    httpsOnly: false
    caCert:
      secretName: consul-ca-certs
    caKey:
      secretName: consul-ca-certs
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul-acl-bootstrap-token
      secretKey: acl-bootstrap-token
  metrics:
    enabled: true
    enableAgentMetrics: true
controller:
  enabled: true
server:
  enabled: false
externalServers:
  enabled: true
  hosts: &servers ["consul01", "consul02", "consul03"]
  httpsPort: 8501
  k8sAuthMethodHost: <k8s_address>
connectInject:
  priorityClassName: "core-critical"
  enabled: true
client:
  join: *servers
  priorityClassName: "daemonset-critical"

In this particular cluster, we got ~10-20 nodes and only a few injected applications (less than 30 pods). After a few weeks, we are seeing that number of ACL tokens is constantly growing. As they don't have any TTL assigned to them, they will never expire.

>consul acl token list  | grep 'token created via login: {"component":"client"}' | wc -l
3130
>consul acl token list  | grep 'token created via login: {"pod":"' | wc -l
1434

Expected behavior

Pods should not leave any ACL tokens that are not needed anymore.

Environment details

Kubernetes version: 1.22.10
Cloud Provider GKE
Networking CNI plugin in use: Calico

Aug 22 '22 14:08 Mlaczkowski

Hi @Mlaczkowski,

Thank you for creating this issue. We have observed this behavior before when Kubernetes nodes are not shut down gracefully. As you said in your issue, the problem here stems from the Pod not having time to do a Consul logout, removing the ACLs.

The only workaround for this at the moment would be for you to manage ACLs yourself. However, I would recommend against that.

I'm interested in why your environment has so many nodes shutting down ungracefully. If this is an expected pattern for you, we might want to look at processes that would clean up the ACL tokens in another way as future work.

For now, the only help I can offer is to recommend against shutting down node ungracefully if you can.

Aug 25 '22 13:08 t-eckert

Thanks for your response. This cluster is our test cluster, we use it to test various new features, configurations, etc. Often the whole cluster is unstable for a longer period of time because of that. We cannot guarantee that it will always work perfectly, as it's somewhat of a playground to test new things. This problem that results in nodes shutting ungracefully is a result of another test and should be soon resolved.

But nevertheless, it can happen even in the most stable environments, just a little problem with VM or bare metal that is sitting underneath can cause a crash of the whole k8s VM. The scale of them will be much lower than from some unstable test clusters, but still, the same problem will be present here. Also achieving 100% graceful shutdown on GKE preemptible nodes or AWS spot instances can be tricky, and not always work perfectly.

What's my point here - you cannot achieve that every node will shut down gracefully, and because of that ACL tokens will accumulate over time.

Please also note that a similar problem is present for the ACLs of injected pods. Those should be revoked by consul-connect-injector pods, but for some reason when the consul container is killed with them (or before them) token is not revoked. From my observation, it only requires that the local consul agent is unavailable during the graceful shutdown of the injected pod.

Aug 26 '22 12:08 Mlaczkowski