nomad
nomad copied to clipboard
Consul tokens not cleaned up if clients restart
Nomad version
Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e
Issue
When using workload identities with Consul, the Consul ACL tokens for services are derived in an alloc runner Prerun hook.
https://github.com/hashicorp/nomad/blob/23e4b7c9d23350f9d3bd2707b0d79f413767c438/client/allocrunner/consul_hook.go#L75-L103
But SetConsulTokens() only store then in memory.
https://github.com/hashicorp/nomad/blob/23e4b7c9d23350f9d3bd2707b0d79f413767c438/client/structs/allochook.go#L61-L71
Since they are not persisted in any kind of durable storage, if the client restarts a new token is generated, leaving the old one behind and never cleaning it up.
Reproduction steps
-
Start a Consul agent with ACL enabled.
# consul.hcl acl = { enabled = true default_policy = "deny" enable_token_persistence = true }consul agent -dev -config ./consul.hcl -
Bootstrap Consul ACL system.
consul acl bootstrap -
Start a Nomad server with the following configuration.
# server.hcl name = "server1" data_dir = "/tmp/nomad/server1" log_level = "DEBUG" ports { http = 4646 rpc = 4647 serf = 4648 } server { enabled = 1 bootstrap_expect = 1 } consul { enabled = true service_identity { aud = ["consul.io"] ttl = "1h" } task_identity { aud = ["consul.io"] ttl = "1h" } }CONSUL_HTTP_TOKEN=... nomad agent -config ./server.hcl -
Start a Nomad agent with the following configuration.
# client.hcl name = "client-1" data_dir = "/tmp/nomad/client1" log_level = "DEBUG" ports { http = 5656 rpc = 5657 serf = 5658 } server { enabled = false } client { enabled = true server_join { retry_join = ["127.0.0.1"] } } consul { enabled = true service_identity { aud = ["consul.io"] ttl = "1h" } task_identity { aud = ["consul.io"] ttl = "1h" } }CONSUL_HTTP_TOKEN=... nomad agent -config ./client.hcl -
Configure Consul JWT auth method for Nomad.
CONSUL_HTTP_TOKEN=... nomad setup consul -y -
Register job with Consul service.
# example.nomad.hcl job "example" { group "cache" { network { port "db" { to = 6379 } } service { name = "redis" port = "db" } task "redis" { driver = "docker" config { image = "redis:7" ports = ["db"] } } } }nomad run example.nomad.hcl -
Verify an ACL token for the service was created.
$ CONSUL_HTTP_TOKEN=... consul acl token list AccessorID: f7650ef5-4bc0-149e-2cef-c67adda4236c SecretID: 35269d2b-9c1d-f735-21a5-0ee64fd56b9e Description: Bootstrap Token (Global Management) Local: false Create Time: 2024-03-21 15:33:52.606667 -0400 EDT Policies: 00000000-0000-0000-0000-000000000001 - global-management AccessorID: 00000000-0000-0000-0000-000000000002 SecretID: anonymous Description: Anonymous Token Local: false Create Time: 2024-03-21 15:33:50.881215 -0400 EDT AccessorID: 9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede SecretID: ea0a4767-fdc0-43b8-b65a-d5ad05721e93 Description: token created via login: {"requested_by":"nomad_service_redis"} Local: true Auth Method: nomad-workloads (Namespace: ) Create Time: 2024-03-21 15:35:11.868371 -0400 EDT Service Identities: redis (Datacenters: all) -
Stop Nomad client and start it again.
-
Verify a new Consul ACL token was created.
$ CONSUL_HTTP_TOKEN=... consul acl token list AccessorID: f7650ef5-4bc0-149e-2cef-c67adda4236c SecretID: 35269d2b-9c1d-f735-21a5-0ee64fd56b9e Description: Bootstrap Token (Global Management) Local: false Create Time: 2024-03-21 15:33:52.606667 -0400 EDT Policies: 00000000-0000-0000-0000-000000000001 - global-management AccessorID: 4d716a07-2d92-c808-82af-eba7e96b3068 SecretID: 424a9ab6-9df9-8782-9c56-8ce13dfe0867 Description: token created via login: {"requested_by":"nomad_service_redis"} Local: true Auth Method: nomad-workloads (Namespace: ) Create Time: 2024-03-21 15:37:50.426149 -0400 EDT Service Identities: redis (Datacenters: all) AccessorID: 00000000-0000-0000-0000-000000000002 SecretID: anonymous Description: Anonymous Token Local: false Create Time: 2024-03-21 15:33:50.881215 -0400 EDT AccessorID: 9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede SecretID: ea0a4767-fdc0-43b8-b65a-d5ad05721e93 Description: token created via login: {"requested_by":"nomad_service_redis"} Local: true Auth Method: nomad-workloads (Namespace: ) Create Time: 2024-03-21 15:35:11.868371 -0400 EDT Service Identities: redis (Datacenters: all) -
Stop
examplejob.nomad job stop example -
Verify first ACL token is left behind.
$ CONSUL_HTTP_TOKEN=... consul acl token list AccessorID: f7650ef5-4bc0-149e-2cef-c67adda4236c SecretID: 35269d2b-9c1d-f735-21a5-0ee64fd56b9e Description: Bootstrap Token (Global Management) Local: false Create Time: 2024-03-21 15:33:52.606667 -0400 EDT Policies: 00000000-0000-0000-0000-000000000001 - global-management AccessorID: 00000000-0000-0000-0000-000000000002 SecretID: anonymous Description: Anonymous Token Local: false Create Time: 2024-03-21 15:33:50.881215 -0400 EDT AccessorID: 9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede SecretID: ea0a4767-fdc0-43b8-b65a-d5ad05721e93 Description: token created via login: {"requested_by":"nomad_service_redis"} Local: true Auth Method: nomad-workloads (Namespace: ) Create Time: 2024-03-21 15:35:11.868371 -0400 EDT Service Identities: redis (Datacenters: all)
Expected Result
The first ACL token created is recovered when the client restarts.
Actual Result
A new ACL token is created, leaving the old one behind.