nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Consul tokens not cleaned up if clients restart

Open lgfa29 opened this issue 1 year ago • 0 comments

Nomad version

Nomad v1.7.6
BuildDate 2024-03-12T07:27:36Z
Revision 594fedbfbc4f0e532b65e8a69b28ff9403eb822e

Issue

When using workload identities with Consul, the Consul ACL tokens for services are derived in an alloc runner Prerun hook. https://github.com/hashicorp/nomad/blob/23e4b7c9d23350f9d3bd2707b0d79f413767c438/client/allocrunner/consul_hook.go#L75-L103

But SetConsulTokens() only store then in memory. https://github.com/hashicorp/nomad/blob/23e4b7c9d23350f9d3bd2707b0d79f413767c438/client/structs/allochook.go#L61-L71

Since they are not persisted in any kind of durable storage, if the client restarts a new token is generated, leaving the old one behind and never cleaning it up.

Reproduction steps

  1. Start a Consul agent with ACL enabled.

    # consul.hcl
    
    acl = {
      enabled                  = true
      default_policy           = "deny"
      enable_token_persistence = true
    }
    
    consul agent -dev -config ./consul.hcl
    
  2. Bootstrap Consul ACL system.

    consul acl bootstrap
    
  3. Start a Nomad server with the following configuration.

    # server.hcl
    
    name      = "server1"
    data_dir  = "/tmp/nomad/server1"
    log_level = "DEBUG"
    
    ports {
      http = 4646
      rpc  = 4647
      serf = 4648
    }
    
    server {
      enabled          = 1
      bootstrap_expect = 1
    }
    
    consul {
      enabled = true
    
      service_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    
      task_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    }
    
    CONSUL_HTTP_TOKEN=... nomad agent -config ./server.hcl
    
  4. Start a Nomad agent with the following configuration.

    # client.hcl
    
    name     = "client-1"
    data_dir = "/tmp/nomad/client1"
    log_level = "DEBUG"
    
    ports {
      http = 5656
      rpc  = 5657
      serf = 5658
    }
    
    server {
      enabled = false
    }
    
    client {
      enabled = true
    
      server_join {
        retry_join = ["127.0.0.1"]
      }
    }
    
    consul {
      enabled = true
    
      service_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    
      task_identity {
        aud = ["consul.io"]
        ttl = "1h"
      }
    }
    
    CONSUL_HTTP_TOKEN=... nomad agent -config ./client.hcl
    
  5. Configure Consul JWT auth method for Nomad.

    CONSUL_HTTP_TOKEN=... nomad setup consul -y
    
  6. Register job with Consul service.

    # example.nomad.hcl
    
    job "example" {
      group "cache" {
        network {
          port "db" {
            to = 6379
          }
        }
    
        service {
          name = "redis"
          port = "db"
        }
    
        task "redis" {
          driver = "docker"
    
          config {
            image = "redis:7"
            ports = ["db"]
          }
        }
      }
    }
    
    nomad run example.nomad.hcl
    
  7. Verify an ACL token for the service was created.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
        00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
        redis (Datacenters: all)
    
  8. Stop Nomad client and start it again.

  9. Verify a new Consul ACL token was created.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
      00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       4d716a07-2d92-c808-82af-eba7e96b3068
    SecretID:         424a9ab6-9df9-8782-9c56-8ce13dfe0867
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:37:50.426149 -0400 EDT
    Service Identities:
      redis (Datacenters: all)
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
      redis (Datacenters: all)
    
  10. Stop example job.

    nomad job stop example
    
  11. Verify first ACL token is left behind.

    $ CONSUL_HTTP_TOKEN=... consul acl token list
    AccessorID:       f7650ef5-4bc0-149e-2cef-c67adda4236c
    SecretID:         35269d2b-9c1d-f735-21a5-0ee64fd56b9e
    Description:      Bootstrap Token (Global Management)
    Local:            false
    Create Time:      2024-03-21 15:33:52.606667 -0400 EDT
    Policies:
      00000000-0000-0000-0000-000000000001 - global-management
    
    AccessorID:       00000000-0000-0000-0000-000000000002
    SecretID:         anonymous
    Description:      Anonymous Token
    Local:            false
    Create Time:      2024-03-21 15:33:50.881215 -0400 EDT
    
    AccessorID:       9b6faf9c-a3dc-7eb8-d4c1-a74d65d32ede
    SecretID:         ea0a4767-fdc0-43b8-b65a-d5ad05721e93
    Description:      token created via login: {"requested_by":"nomad_service_redis"}
    Local:            true
    Auth Method:      nomad-workloads (Namespace: )
    Create Time:      2024-03-21 15:35:11.868371 -0400 EDT
    Service Identities:
      redis (Datacenters: all)
    

Expected Result

The first ACL token created is recovered when the client restarts.

Actual Result

A new ACL token is created, leaving the old one behind.

lgfa29 avatar Mar 21 '24 19:03 lgfa29