consul-esm icon indicating copy to clipboard operation
consul-esm copied to clipboard

Support Rotating ACL Tokens

Open lornasong opened this issue 5 years ago • 0 comments

When a consul-esm instance's token is revoked, maybe from rotating acl tokens, there are some unexpected outcomes for consul-esm:

  • the instance's status remains passing/healthy and is never marked critical. This can be seen at /v1/health/node/:node
  • the instance's assigned external health checks are not successfully executed. as a result of staying "passing"/"healthy", the instance's assigned external health checks are not reassigned to other actually healthy instances with appropriate tokens
  • the instance is not able to successfully deregister

The revoked token is needed to update the health check and deregister. This is expected as a result of anti-entropy.

The larger issue around supporting rotating acl tokens is already captured in https://github.com/hashicorp/consul/issues/4372. The recommendation is to reregister the application (consul-esm in this case) with the new token.

Currently, consul-esm doesn't have a way to reregister itself. On stopping and restarting consul-esm, the stopped instance will fail to deregister while the newly started instance will obtain a new id. This leads to having 'dead', floating consul-esm instances in the cluster. A serious consequence is that these dead consul-esm instances retain responsibility for their external health checks since they remain marked as healthy/passing in the catalog.

This issue arises from comment: https://github.com/hashicorp/consul-esm/issues/39#issuecomment-567750936

Steps to reproduce

  1. Start consul (I used v1.6.2) with ACLs enabled
  2. Register two external health checks
  3. Start consul-esm (I used v0.3.3) with relevant token needed to operate and log_level=DEBUG
  4. Start another consul-esm with a different token needed to operate and log_level=DEBUG
  5. Observe that each consul-esm is executing one of the external health checks
  6. Delete token for one of the consul-esms
  7. Observe in consul-logs that revoked-token consul-esm has failed its TTL check
  8. Query /v1/health/node/<revoked-token-consul-esm-id> and see that the status is still passing
  9. Stop revoked-token consul-esm instance (Control+C)
  10. Observe in consul-logs that consul-esm was not able to successfully deregister
  11. Observe in remaining healthy consul-esm instance that it is executing only one external health check - the one it was originally assigned - and it did not inherit the other external health check

lornasong avatar Jan 08 '20 18:01 lornasong