consul-template icon indicating copy to clipboard operation
consul-template copied to clipboard

Nomad provisions dynamic database (PostgreSQL) credentials, immediately renews them and provides the old credentials to the container

Open teodorkostov opened this issue 6 months ago • 3 comments

Nomad version

Nomad v1.10.2 Vault v1.19.5

Operating system and Environment details

Arch Linux

Issue

The dynamically provisioned database credentials from vault for PostgreSQL are not correct (password differs). However, provisioning dynamic credentials with the Vault CLI results in working credentials that provide the correct access to PostgreSQL.

Reproduction steps

PostgreSQL container ghcr.io/immich-app/postgres:17-vectorchord0.4.3-pgvector0.8.0-pgvectors0.3.0.

Vault resources

resource "vault_mount" "postgresql" {
  path = "postgresql"
  type = "database"
  description = "PostgreSQL database credentials backend."
}

resource "vault_database_secret_backend_connection" "postgresql" {
  backend       = vault_mount.postgresql.path
  name          = "connection-postgresql"
  plugin_name   = "postgresql-database-plugin"
  allowed_roles = ["role-postgresql"]

  postgresql {
    connection_url = "postgres://{{username}}:{{password}}@postgresql.service.consul:5432/postgres"
    username = var.username
    password = var.password
    password_authentication = "scram-sha-256"
    username_template = "v_immich_{{unix_time}}"
  }
}

resource "vault_database_secret_backend_role" "postgresql" {
  backend             = vault_mount.postgresql.path
  name                = "role-postgresql"
  db_name             = vault_database_secret_backend_connection.postgresql.name

  creation_statements = [
    "CREATE ROLE {{username}} WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
    "GRANT SELECT ON ALL TABLES IN SCHEMA public TO {{username}};",
    "GRANT ALL PRIVILEGES ON DATABASE immich TO {{username}};",
  ]
  rollback_statements = [
    "DROP ROLE IF EXISTS {{username}};",
  ]
  renew_statements = [
    "ALTER ROLE {{username}} WITH PASSWORD '{{password}}';",
  ]
  revocation_statements = [
    "REVOKE ALL PRIVILEGES ON DATABASE immich FROM {{username}};",
    "DROP ROLE IF EXISTS {{username}};",
  ]
}

resource "vault_policy" "postgresql" {
  name   = "policy-postgresql"

  policy = <<-EOT
    path "${vault_mount.postgresql.path}/creds/${vault_database_secret_backend_role.postgresql.name}" {
      capabilities = ["read"]
    }
  EOT
}

When we run the photos workload (job is provided below) we can see that dynamic credentials are provisioned. However, When we look into the the lease, we can see that it has been renewed immediately.

$ curl --header "X-Vault-Token: $VAULT_TOKEN" --request LIST https://vault:8200/v1/sys/leases/lookup/postgresql/creds/role-postgresql/ | jq . % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 217 100 217 0 0 7052 0 --:--:-- --:--:-- --:--:-- 8037 { "request_id": "719abd84-eeb2-6390-873e-61da8df70a9f", "lease_id": "", "renewable": false, "lease_duration": 0, "data": { "keys": [ "u8YKcQi6KalPEA07aclHPNyK" ] }, "wrap_info": null, "warnings": null, "auth": null, "mount_type": "system" } $ vault lease lookup postgresql/creds/role-postgresql/u8YKcQi6KalPEA07aclHPNyK Key Value


expire_time 2025-07-26T10:09:28.686861628Z id postgresql/creds/role-postgresql/u8YKcQi6KalPEA07aclHPNyK issue_time 2025-06-24T10:09:28.667139714Z last_renewal 2025-06-24T10:09:28.686861918Z renewable true ttl 767h59m32s

If we manually provision dynamic database credentials from vault with vault read postgresql/creds/role-postgresql and lookup the lease we can see that the credentials have not been renewed.

Key Value


expire_time 2025-07-26T10:53:04.216999787Z id postgresql/creds/role-postgresql/I6c3pTgTZY6BwUxSaBXmGfIW issue_time 2025-06-24T10:53:04.216999507Z last_renewal <nil> renewable true ttl 767h59m46s

Expected Result

PostgreSQL authentication success.

Actual Result

PostgreSQL authentication error.

Job file (if appropriate)

job "photos" {
  datacenters = ["*"]
  type        = "service"

  group "photos" {
    count = 1

    network {
      mode = "host"

      port "web" {
        to = 2283
      }
    }

    restart {
      attempts = 10
      interval = "3h"
      delay = "15m"
      mode = "delay"
    }

    task "photos" {
      driver = "docker"

      config {
        image = "ghcr.io/immich-app/immich-server:v1.135.3"
        entrypoint = ["tail", "-f", "/dev/null"]

        network_mode = "nginx"
        ports        = []

        mount {
          type     = "volume"
          source   = "photos"
          target   = "/usr/src/app/upload"
        }

        mount {
          type     = "bind"
          source   = "/etc/localtime"
          target   = "/etc/localtime"
          readonly = true
        }
      }

      service {
        name = "photos"
        port = "web"
      }

      template {
        data        = <<-EOT
          {{ with secret "postgresql/creds/role-postgresql" }}
          DB_USERNAME="{{ .Data.username }}"
          DB_PASSWORD="{{ .Data.password }}"
          {{ end }}
          DB_HOSTNAME = "postgresql.service.consul"
          REDIS_HOSTNAME = "redis.service.consul"
          REDIS_USERNAME="immich"
          REDIS_PASSWORD="..."
        EOT

        destination = "secrets/env"
        env = true
      }

      vault {}
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Reverse log

[INFO] client.alloc_runner.task_runner: Task event: alloc_id=e65b2dd2-3fe7-8415-c967-aa50a5c5b074 task=photos type=Started msg="Task started by cli> [INFO] client.driver_mgr.docker: started container: driver=docker container_id=c49c19a100410307ec8218dac07675e0fac444f0bf9aa0ab6a4142375b37f959 [INFO] client.driver_mgr.docker: created container: driver=docker container_id=c49c19a100410307ec8218dac07675e0fac444f0bf9aa0ab6a4142375b37f959 [INFO] agent: (runner) rendered "(dynamic)" => "/storage/data/nomad/alloc/e65b2dd2-3fe7-8415-c967-aa50a5c5b074/photos/secrets/env" [INFO] agent: (runner) starting [INFO] agent: (runner) creating watcher [INFO] agent: (runner) creating new runner (dry: false, once: false) [INFO] client.alloc_runner.task_runner: Task event: alloc_id=e65b2dd2-3fe7-8415-c967-aa50a5c5b074 task=photos type="Task Setup" msg="Building Task > [INFO] client.alloc_runner.task_runner: Task event: alloc_id=e65b2dd2-3fe7-8415-c967-aa50a5c5b074 task=photos type=Received msg="Task received by c> [INFO] client.gc: marking allocation for GC: alloc_id=dd6785d4-96ea-5091-9503-82a28987b475 [INFO] agent: (runner) received finish [INFO] agent: (runner) stopping

teodorkostov avatar Jun 24 '25 10:06 teodorkostov

Possibly related to hashicorp/nomad#15057.

teodorkostov avatar Jun 24 '25 10:06 teodorkostov

Hey @teodorkostov, thanks for reporting this issue. It appears that this is a problem with consul-template; Nomad doesn't get the dynamic secrets itself, it just spins up the CT instance to do it.

I'll transfer the issue over to consul-template repository.

pkazmierczak avatar Jun 27 '25 07:06 pkazmierczak

Hey, @pkazmierczak, thank you for the feedback. I hope this gets solved. It was very time consuming to investigate.

teodorkostov avatar Jun 27 '25 17:06 teodorkostov