nomad icon indicating copy to clipboard operation
nomad copied to clipboard

failed to submit plan for evaluation: ... no such key \"<snip>\" in keyring error after moving cluster to 1.4.1

Open bfqrst opened this issue 3 years ago • 20 comments

Nomad version

Nomad v1.4.1 (2aa7e66bdb526e25f59883952d74dad7ea9a014e)

Operating system and Environment details

Ubuntu 22.04, Nomad 1.4.1

Issue

After moving the Nomad server and clients to v1.4.1, I noticed that sometimes (unfortunately not always) after cycling Nomad server ASGs and Nomad client ASGs with new AMIs, jobs scheduled on the workers can't be allocated. So to be precise:

  1. Pipeline creates new Nomad AMIs via Packer
  2. Pipeline terraforms Nomad server ASG with server config
  3. Pipeline terraforms client ASG or dedicated instances with updated AMI
  4. Lost jobs on worker (like for instance the Traefik ingress job) can't be allocated

This literally never happened before 1.4.X

Client output looks like this:

nomad eval list

ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures 427e9905 50 failed-follow-up plugin-aws-ebs-nodes default pending false 35f4fdfb 50 failed-follow-up plugin-aws-efs-nodes default pending false 46152dcd 50 failed-follow-up spot-drainer default pending false 71e3e58a 50 failed-follow-up plugin-aws-ebs-nodes default pending false e86177a6 50 failed-follow-up plugin-aws-efs-nodes default pending false 2289ba5f 50 failed-follow-up spot-drainer default pending false da3fdad6 50 failed-follow-up plugin-aws-ebs-nodes default pending false b445b976 50 failed-follow-up plugin-aws-efs-nodes default pending false 48a6771e 50 failed-follow-up ingress default pending false

Reproduction steps

Unclear at this point. I seem to be able to somewhat force the issue, when I cycle the Nomad server ASG with updated AMIs.

Expected Result

Client work that was lost, should be rescheduled once the Nomad client comes up and reports readiness.

Actual Result

Lost jobs that can't be allocated on worker with an updated AMI.

nomad status

ID Type Priority Status Submit Date auth-service service 50 pending 2022-10-09T11:32:57+02:00 ingress service 50 pending 2022-10-17T14:57:26+02:00 plugin-aws-ebs-controller service 50 running 2022-10-09T14:48:11+02:00 plugin-aws-ebs-nodes system 50 running 2022-10-09T14:48:11+02:00 plugin-aws-efs-nodes system 50 running 2022-10-09T11:37:04+02:00 prometheus service 50 pending 2022-10-18T21:19:24+02:00 spot-drainer system 50 running 2022-10-11T18:04:49+02:00

Job file (if appropriate)

variable "stage" {
  type        = string
  description = "The stage for this jobfile."
}

variable "domain_suffix" {
  type        = string
  description = "The HDI stage specific domain suffix."
}

variable "acme_route" {
  type = string
}

variables {
  step_cli_version = "0.22.0"
  traefik_version  = "2.9.1"
}

job "ingress" {

  datacenters = [join("-", ["pd0011", var.stage])]

  type = "service"

  group "ingress" {

    constraint {
      attribute = meta.instance_type
      value     = "ingress"
    }

    count = 1

    service {
      name = "traefik"
      tags = [
        "traefik.enable=true",

        "traefik.http.routers.api.rule=Host(`ingress.dsp.${var.domain_suffix}`)",
        "traefik.http.routers.api.entrypoints=secure",
        "traefik.http.routers.api.service=api@internal",
        "traefik.http.routers.api.tls.certresolver=hdi_acme_resolver",
        "traefik.http.routers.api.tls.options=tls13@file",
        "traefik.http.routers.api.middlewares=dspDefaultPlusAdmin@file",

        "traefik.http.routers.ping.rule=Host(`ingress.dsp.${var.domain_suffix}`) && Path(`/ping`)",
        "traefik.http.routers.ping.entrypoints=secure",
        "traefik.http.routers.ping.service=ping@internal",
        "traefik.http.routers.ping.tls.certresolver=hdi_acme_resolver",
        "traefik.http.routers.ping.tls.options=tls13@file",
        "traefik.http.routers.ping.middlewares=dspDefault@file"
      ]

      port = "https"

      check {
        name     = "Traefik Ping Endpoint"
        type     = "http"
        protocol = "http"
        port     = "http"
        path     = "/ping"
        interval = "10s"
        timeout  = "2s"
      }
    }

    network {

      port "http" {
        static = 80
        to     = 80
      }
      port "https" {
        static = 443
        to     = 443
      }
    }

    ephemeral_disk {
      size    = "300"
      sticky  = true
      migrate = true
    }

    task "generate_consul_cert" {
<snip>
    }

    task "generate_nomad_cert" {
<snip>
    }


    task "traefik" {

      driver = "docker"

      env {
        LEGO_CA_CERTIFICATES = join(":", ["${NOMAD_SECRETS_DIR}/cacert.pem", "${NOMAD_SECRETS_DIR}/root_ca_${var.stage}.crt"])
        # LEGO_CA_SYSTEM_CERT_POOL = true
      }

      config {
        image = "traefik:${var.traefik_version}"
        volumes = [
          # Use absolute paths to mount arbitrary paths on the host
          "local/:/etc/traefik/",
          "/etc/timezone:/etc/timezone:ro",
          "/etc/localtime:/etc/localtime:ro",
        ]
        network_mode = "host"
        ports        = ["http", "https"]
      }

      resources {
        cpu    = 800
        memory = 128
      }
      # Controls the timeout between signalling a task it will be killed
      # and killing the task. If not set a default is used.
      kill_timeout = "60s"

      template {
        data        = <<EOH
<snip>
    }
  }
}

Nomad Server logs (if appropriate)

Oct 20 15:00:30 uat-nomad-95I nomad[485]:     2022-10-20T15:00:30.571+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 15:00:42 uat-nomad-95I nomad[485]:     2022-10-20T15:00:42.948+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=827f0dfe-0584-b44a-92e2-9a92ab649c48 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"

Nomad Client logs (if appropriate)

Oct 20 11:55:00 uat-worker-wZz nomad[464]:              Log Level: INFO
Oct 20 11:55:00 uat-worker-wZz nomad[464]:                 Region: europe (DC: pd0011-uat)
Oct 20 11:55:00 uat-worker-wZz nomad[464]:                 Server: false
Oct 20 11:55:00 uat-worker-wZz nomad[464]:                Version: 1.4.1
Oct 20 11:55:00 uat-worker-wZz nomad[464]: ==> Nomad agent started! Log data will stream in below:
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.798+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.817+0200 [INFO]  client: using state directory: state_dir=/opt/hsy/nomad/data/client
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.826+0200 [INFO]  client: using alloc directory: alloc_dir=/opt/hsy/nomad/data/alloc
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.826+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.831+0200 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.852+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens5
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.856+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.870+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens5
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.897+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.900+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.900+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:54:58.906+0200 [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get \"https://127.0.0.1:8501/v1/catalog/datacenters\": dial tcp 127.0.0.1:8501: connect: connection refused"
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.437+0200 [INFO]  client: started client: node_id=5f21ebef-e0a9-8bd2-775a-61b3e32cac6e
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.437+0200 [WARN]  agent: not registering Nomad HTTPS Health Check because verify_https_client enabled
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.438+0200 [WARN]  client.server_mgr: no servers available
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.439+0200 [WARN]  client.server_mgr: no servers available
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.453+0200 [INFO]  client.consul: discovered following servers: servers=[10.194.73.146:4647, 10.194.74.253:4647, 10.194.75.103:4647]
Oct 20 11:55:00 uat-worker-wZz nomad[464]:     2022-10-20T11:55:00.501+0200 [INFO]  client: node registration complete
Oct 20 11:55:06 uat-worker-wZz nomad[464]:     2022-10-20T11:55:06.856+0200 [INFO]  client: node registration complete
Oct 20 11:55:14 uat-worker-wZz nomad[464]:     2022-10-20T11:55:14.893+0200 [INFO]  client.fingerprint_mgr.consul: consul agent is available
Oct 20 11:55:21 uat-worker-wZz nomad[464]:     2022-10-20T11:55:21.417+0200 [INFO]  client: node registration complete

bfqrst avatar Oct 20 '22 13:10 bfqrst

Hi @bfqrst! Thanks for opening this issue. The keyring being referred to here is new in 1.4.x and supports the new Workload Identity feature. When a new server joins the cluster, it streams the raft snapshot from the old cluster. It also starts up keyring replication from the old cluster. The keyring replication loop on the server reads key metadata from raft, sees a key it doesn't have in its local keyring, and then sends a RPC to the leader to get that key (falling back to polling all the other peers for the key if the leader doesn't have it so that we can get the key even if there was a leader election immediately following a new key).

What seems to be happening in your case is that the new servers aren't replicating the keyring, which means the leader can't sign the workload identity for the replacement allocations. Do you have any server logs containing the word keyring around the time the new servers were started up? That might help track this down.

We've had a similar report from an internal user in their staging environment as well. Their workflow seems somewhat similar to what you're reporting here, so I want to double-check that:

  • When you bring up the new AMI, are you bringing them all up at once?
  • When you bring up the new AMI, are you ensuring that all servers are current on their raft index before shutting down the old servers?
  • What is bootstrap_expect set to?
  • Is your cluster federated with another cluster?

Also, it might help if we could get a stack trace from the servers. you can trigger this via SIGQUIT, which will dump it to stderr. If it's really long, you can email it to [email protected] with a subject line pointing to this issue and I'll see it.

tgross avatar Oct 20 '22 13:10 tgross

Thanks for looking into it @tgross!

So bootstrap_expect is set to 3 with no federation whatsoever. No ACLs but full TLS and gossip encryption. I really don't know if the logs I caught are from the cluster formation...

Oct 19 09:58:50 uat-nomad-95I systemd[1]: Started "HashiCorp Nomad - A workload orchestration solution".
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Loaded configuration from /opt/hsy/nomad/config/nomad_server.hcl
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Starting Nomad agent...
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Nomad agent configuration:
Oct 19 09:58:51 uat-nomad-95I nomad[485]:        Advertise Addrs: HTTP: 10.194.73.146:4646; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 19 09:58:51 uat-nomad-95I nomad[485]:             Bind Addrs: HTTP: [127.0.0.1:4646 10.194.73.146:4646]; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Client: false
Oct 19 09:58:51 uat-nomad-95I nomad[485]:              Log Level: INFO
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Region: europe (DC: pd0011-uat)
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Server: true
ops@uat-nomad-95I:~$ head nomad.txt -n 100
Oct 19 09:58:50 uat-nomad-95I systemd[1]: Started "HashiCorp Nomad - A workload orchestration solution".
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Loaded configuration from /opt/hsy/nomad/config/nomad_server.hcl
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Starting Nomad agent...
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Nomad agent configuration:
Oct 19 09:58:51 uat-nomad-95I nomad[485]:        Advertise Addrs: HTTP: 10.194.73.146:4646; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 19 09:58:51 uat-nomad-95I nomad[485]:             Bind Addrs: HTTP: [127.0.0.1:4646 10.194.73.146:4646]; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Client: false
Oct 19 09:58:51 uat-nomad-95I nomad[485]:              Log Level: INFO
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Region: europe (DC: pd0011-uat)
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                 Server: true
Oct 19 09:58:51 uat-nomad-95I nomad[485]:                Version: 1.4.1
Oct 19 09:58:51 uat-nomad-95I nomad[485]: ==> Nomad agent started! Log data will stream in below:
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.878+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.878+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.878+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.878+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.878+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.917+0200 [INFO]  nomad: setting up raft bolt store: no_freelist_sync=false
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.920+0200 [INFO]  nomad.raft: initial configuration: index=0 servers=[]
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.920+0200 [INFO]  nomad.raft: entering follower state: follower="Node at 10.194.73.146:4647 [Follower]" leader-address= leader-id=
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.923+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-95I.europe 10.194.73.146
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.923+0200 [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=["sysbatch", "service", "batch", "system", "_core"]
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.923+0200 [INFO]  nomad: started scheduling worker(s): num_workers=2 schedulers=["sysbatch", "service", "batch", "system", "_core"]
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.923+0200 [WARN]  agent: not registering Nomad HTTPS Health Check because verify_https_client enabled
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.926+0200 [INFO]  nomad: adding server: server="uat-nomad-95I.europe (Addr: 10.194.73.146:4647) (DC: pd0011-uat)"
Oct 19 09:58:51 uat-nomad-95I nomad[485]:     2022-10-19T09:58:51.926+0200 [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get \"https://127.0.0.1:8501/v1/catalog/datacenters\": dial tcp 127.0.0.1:8501: connect: connection refused"
Oct 19 09:58:53 uat-nomad-95I nomad[485]:     2022-10-19T09:58:53.180+0200 [WARN]  nomad.raft: no known peers, aborting election
Oct 19 09:59:02 uat-nomad-95I nomad[485]:     2022-10-19T09:59:02.001+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:02 uat-nomad-95I nomad[485]:     2022-10-19T09:59:02.268+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:07 uat-nomad-95I nomad[485]:     2022-10-19T09:59:07.186+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:07 uat-nomad-95I nomad[485]:     2022-10-19T09:59:07.441+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:12 uat-nomad-95I nomad[485]:     2022-10-19T09:59:12.601+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:13 uat-nomad-95I nomad[485]:     2022-10-19T09:59:13.057+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:19 uat-nomad-95I nomad[485]:     2022-10-19T09:59:19.070+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:19 uat-nomad-95I nomad[485]:     2022-10-19T09:59:19.350+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:29 uat-nomad-95I nomad[485]:     2022-10-19T09:59:29.259+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:29 uat-nomad-95I nomad[485]:     2022-10-19T09:59:29.548+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:44 uat-nomad-95I nomad[485]:     2022-10-19T09:59:44.354+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="No cluster leader"
Oct 19 09:59:44 uat-nomad-95I nomad[485]:     2022-10-19T09:59:44.787+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="No cluster leader"
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.942+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-ggK.europe 10.194.74.112
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.942+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-072.europe 10.194.75.246
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.942+0200 [INFO]  nomad: adding server: server="uat-nomad-ggK.europe (Addr: 10.194.74.112:4647) (DC: pd0011-uat)"
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.946+0200 [INFO]  nomad: successfully contacted Nomad servers: num_servers=2
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.950+0200 [WARN]  nomad.raft: failed to get previous log: previous-index=24648 last-index=0 error="log not found"
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.952+0200 [INFO]  nomad: disabling bootstrap mode because existing Raft peers being reported by peer: peer_name=uat-nomad-ggK.europe peer_address=10.194.74.112:4647
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.952+0200 [INFO]  nomad: adding server: server="uat-nomad-072.europe (Addr: 10.194.75.246:4647) (DC: pd0011-uat)"
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.956+0200 [INFO]  snapshot: creating new snapshot: path=/opt/hsy/nomad/data/server/raft/snapshots/4-24584-1666166388956.tmp
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.962+0200 [INFO]  nomad.raft: snapshot network transfer progress: read-bytes=394042 percent-complete="100.00%"
Oct 19 09:59:48 uat-nomad-95I nomad[485]:     2022-10-19T09:59:48.971+0200 [INFO]  nomad.raft: copied to local snapshot: bytes=394042
Oct 19 09:59:49 uat-nomad-95I nomad[485]:     2022-10-19T09:59:49.004+0200 [INFO]  nomad.raft: snapshot restore progress: id=4-24584-1666166388956 last-index=24584 last-term=4 size-in-bytes=394042 read-bytes=394042 percent-complete="100.00%"
Oct 19 09:59:49 uat-nomad-95I nomad[485]:     2022-10-19T09:59:49.004+0200 [INFO]  nomad.raft: Installed remote snapshot
Oct 19 10:03:22 uat-nomad-95I nomad[485]:     2022-10-19T10:03:22.569+0200 [INFO]  nomad: serf: EventMemberLeave: uat-nomad-ggK.europe 10.194.74.112
Oct 19 10:03:22 uat-nomad-95I nomad[485]:     2022-10-19T10:03:22.569+0200 [INFO]  nomad: removing server: server="uat-nomad-ggK.europe (Addr: 10.194.74.112:4647) (DC: pd0011-uat)"
Oct 19 10:04:25 uat-nomad-95I nomad[485]:     2022-10-19T10:04:25.357+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-Kxk.europe 10.194.74.253
Oct 19 10:04:25 uat-nomad-95I nomad[485]:     2022-10-19T10:04:25.357+0200 [INFO]  nomad: adding server: server="uat-nomad-Kxk.europe (Addr: 10.194.74.253:4647) (DC: pd0011-uat)"
Oct 19 10:08:45 uat-nomad-95I nomad[485]:     2022-10-19T10:08:45.357+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="rpc error: eval broker disabled"
Oct 19 10:08:45 uat-nomad-95I nomad[485]:     2022-10-19T10:08:45.357+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="rpc error: eval broker disabled"
Oct 19 10:08:46 uat-nomad-95I nomad[485]:     2022-10-19T10:08:46.369+0200 [WARN]  nomad.raft: rejecting vote request since we have a leader: from=10.194.74.253:4647 leader=10.194.75.246:4647 leader-id=4d46ab0f-aaae-002a-f916-ee8670d327bc
Oct 19 10:08:46 uat-nomad-95I nomad[485]:     2022-10-19T10:08:46.544+0200 [INFO]  nomad: serf: EventMemberLeave: uat-nomad-072.europe 10.194.75.246
Oct 19 10:08:46 uat-nomad-95I nomad[485]:     2022-10-19T10:08:46.544+0200 [INFO]  nomad: removing server: server="uat-nomad-072.europe (Addr: 10.194.75.246:4647) (DC: pd0011-uat)"
Oct 19 10:08:46 uat-nomad-95I nomad[485]:     2022-10-19T10:08:46.918+0200 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader-addr=10.194.75.246:4647 last-leader-id=4d46ab0f-aaae-002a-f916-ee8670d327bc
Oct 19 10:08:46 uat-nomad-95I nomad[485]:     2022-10-19T10:08:46.918+0200 [INFO]  nomad.raft: entering candidate state: node="Node at 10.194.73.146:4647 [Candidate]" term=7
Oct 19 10:08:48 uat-nomad-95I nomad[485]:     2022-10-19T10:08:48.046+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="rpc error: No cluster leader"
Oct 19 10:08:48 uat-nomad-95I nomad[485]:     2022-10-19T10:08:48.046+0200 [ERROR] worker: failed to dequeue evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="rpc error: No cluster leader"
Oct 19 10:08:48 uat-nomad-95I nomad[485]:     2022-10-19T10:08:48.374+0200 [INFO]  nomad.raft: entering follower state: follower="Node at 10.194.73.146:4647 [Follower]" leader-address= leader-id=
Oct 19 10:09:54 uat-nomad-95I nomad[485]:     2022-10-19T10:09:54.850+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-tMd.europe 10.194.75.103
Oct 19 10:09:54 uat-nomad-95I nomad[485]:     2022-10-19T10:09:54.851+0200 [INFO]  nomad: adding server: server="uat-nomad-tMd.europe (Addr: 10.194.75.103:4647) (DC: pd0011-uat)"
Oct 20 09:16:38 uat-nomad-95I nomad[485]:     2022-10-20T09:16:38.333+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d eval_id=03b77c02-1121-9a63-0d64-48868310227c error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:38 uat-nomad-95I nomad[485]:     2022-10-20T09:16:38.335+0200 [ERROR] worker: error invoking scheduler: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:39 uat-nomad-95I nomad[485]:     2022-10-20T09:16:39.336+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d eval_id=9b8451a5-b344-6295-0556-1e863005b076 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:39 uat-nomad-95I nomad[485]:     2022-10-20T09:16:39.336+0200 [ERROR] worker: error invoking scheduler: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:39 uat-nomad-95I nomad[485]:     2022-10-20T09:16:39.339+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=03b77c02-1121-9a63-0d64-48868310227c error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:39 uat-nomad-95I nomad[485]:     2022-10-20T09:16:39.339+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:59 uat-nomad-95I nomad[485]:     2022-10-20T09:16:59.339+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d eval_id=9b8451a5-b344-6295-0556-1e863005b076 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:59 uat-nomad-95I nomad[485]:     2022-10-20T09:16:59.339+0200 [ERROR] worker: error invoking scheduler: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:59 uat-nomad-95I nomad[485]:     2022-10-20T09:16:59.342+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=03b77c02-1121-9a63-0d64-48868310227c error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:16:59 uat-nomad-95I nomad[485]:     2022-10-20T09:16:59.342+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:18:52 uat-nomad-95I nomad[485]:     2022-10-20T09:18:52.789+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=11e43b96-6f71-51d7-5c1f-37118b07ea5e error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:18:52 uat-nomad-95I nomad[485]:     2022-10-20T09:18:52.789+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:18:53 uat-nomad-95I nomad[485]:     2022-10-20T09:18:53.792+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d eval_id=11e43b96-6f71-51d7-5c1f-37118b07ea5e error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:18:53 uat-nomad-95I nomad[485]:     2022-10-20T09:18:53.792+0200 [ERROR] worker: error invoking scheduler: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:21:14 uat-nomad-95I nomad[485]:     2022-10-20T09:21:14.762+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d eval_id=bfe00218-ccb1-cd80-f4da-a3e8f66002c7 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:21:14 uat-nomad-95I nomad[485]:     2022-10-20T09:21:14.762+0200 [ERROR] worker: error invoking scheduler: worker_id=4375ac27-f83a-7f0c-710e-2c5a221e438d error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:01 uat-nomad-95I nomad[485]:     2022-10-20T09:22:01.891+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=f4324e9c-59f7-fed9-ef3a-139720072cb0 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:01 uat-nomad-95I nomad[485]:     2022-10-20T09:22:01.891+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:02 uat-nomad-95I nomad[485]:     2022-10-20T09:22:02.894+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=f4324e9c-59f7-fed9-ef3a-139720072cb0 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:02 uat-nomad-95I nomad[485]:     2022-10-20T09:22:02.894+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:22 uat-nomad-95I nomad[485]:     2022-10-20T09:22:22.899+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=f4324e9c-59f7-fed9-ef3a-139720072cb0 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:22 uat-nomad-95I nomad[485]:     2022-10-20T09:22:22.899+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:37 uat-nomad-95I nomad[485]:     2022-10-20T09:22:37.904+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=023e409c-f644-7e0f-33d8-813623ce622e error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:37 uat-nomad-95I nomad[485]:     2022-10-20T09:22:37.904+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:38 uat-nomad-95I nomad[485]:     2022-10-20T09:22:38.907+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab eval_id=023e409c-f644-7e0f-33d8-813623ce622e error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 09:22:38 uat-nomad-95I nomad[485]:     2022-10-20T09:22:38.907+0200 [ERROR] worker: error invoking scheduler: worker_id=c4d91fc3-5e23-dbec-a85d-8fc830f375ab error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"

After that it's basically failed to submit plan for evaluation until infinity.

When you bring up the new AMI, are you bringing them all up at once?

Nope, that would be one at a time.

When you bring up the new AMI, are you ensuring that all servers are current on their raft index before shutting down the old servers?

Hmm, I don't think it's actively ensured, but between server instance switches I think there's a healthy 5 minutes on a cluster with very few jobs...

bfqrst avatar Oct 20 '22 14:10 bfqrst

The total lack of keyring.replicator logs here is very suspicious to me. There's a trace log at the start of replication (which we won't see here as it looks like your logs are at info-level), but there should be a whole bunch of logs around failures if there's been a failure to fetch the key. Now I'm wondering if we might be stuck waiting on a mutex in the replication loop in encrypter.go#L419-L512. I've been trying to reproduce this bug but it may be that my rig isn't hitting the right timing to enter any lock -- so that goroutine dump would be extra useful if it's possible to get.

tgross avatar Oct 20 '22 14:10 tgross

Okay @tgross I did SIGQUIT the server process. Logs are out as we speak! Meanwhile, after the restart, we see keyring.replicator business:

Oct 20 17:06:53 uat-nomad-95I systemd[1]: Started "HashiCorp Nomad - A workload orchestration solution".
Oct 20 17:06:53 uat-nomad-95I nomad[5241]: WARNING: keyring exists but -encrypt given, using keyring
Oct 20 17:06:53 uat-nomad-95I nomad[5241]: ==> Loaded configuration from /opt/hsy/nomad/config/nomad_server.hcl
Oct 20 17:06:53 uat-nomad-95I nomad[5241]: ==> Starting Nomad agent...
Oct 20 17:06:54 uat-nomad-95I nomad[5241]: ==> Nomad agent configuration:
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:        Advertise Addrs: HTTP: 10.194.73.146:4646; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:             Bind Addrs: HTTP: [127.0.0.1:4646 10.194.73.146:4646]; RPC: 10.194.73.146:4647; Serf: 10.194.73.146:4648
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:                 Client: false
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:              Log Level: INFO
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:                 Region: europe (DC: pd0011-uat)
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:                 Server: true
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:                Version: 1.4.1
Oct 20 17:06:54 uat-nomad-95I nomad[5241]: ==> Nomad agent started! Log data will stream in below:
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.985+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.985+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.985+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.985+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.985+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.994+0200 [INFO]  nomad: setting up raft bolt store: no_freelist_sync=false
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:53.999+0200 [INFO]  nomad.raft: starting restore from snapshot: id=4-24584-1666166388956 last-index=24584 last-term=4 size-in-bytes=394042
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.014+0200 [INFO]  nomad.raft: snapshot restore progress: id=4-24584-1666166388956 last-index=24584 last-term=4 size-in-bytes=394042 read-bytes=394042 percent-complete="100.00%"
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.015+0200 [INFO]  nomad.raft: restored from snapshot: id=4-24584-1666166388956 last-index=24584 last-term=4 size-in-bytes=394042
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.193+0200 [INFO]  nomad.raft: initial configuration: index=24750 servers="[{Suffrage:Voter ID:209b28b1-d4ba-111a-93b4-5433c76ca54d Address:10.194.73.146:4647} {Suffrage:Voter ID:29c95ced-78be-e7cc-2ca9-d24f4045cc2a Address:10.194.74.253:4647} {Suffrage:Voter ID:437dd9d3-c70a-640a-7bc3-b74c04a09bb0 Address:10.194.75.103:4647}]"
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.193+0200 [INFO]  nomad.raft: entering follower state: follower="Node at 10.194.73.146:4647 [Follower]" leader-address= leader-id=
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-95I.europe 10.194.73.146
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [INFO]  nomad: started scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [WARN]  agent: not registering Nomad HTTPS Health Check because verify_https_client enabled
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [INFO]  nomad: serf: Attempting re-join to previously known node: uat-nomad-Kxk.europe: 10.194.74.253:4648
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.194+0200 [INFO]  nomad: adding server: server="uat-nomad-95I.europe (Addr: 10.194.73.146:4647) (DC: pd0011-uat)"
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.196+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-tMd.europe 10.194.75.103
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.196+0200 [WARN]  nomad: memberlist: Refuting an alive message for 'uat-nomad-95I.europe' (10.194.73.146:4648) meta:([255 138 164 114 111 108 101 165 110 111 109 97 100 166 114 101 103 105 111 110 166 101 117 114 111 112 101 163 118 115 110 161 49 168 114 97 102 116 95 118 115 110 161 51 162 105 100 218 0 36 50 48 57 98 50 56 98 49 45 100 52 98 97 45 49 49 49 97 45 57 51 98 52 45 53 52 51 51 99 55 54 99 97 53 52 100 168 114 112 99 95 97 100 100 114 173 49 48 46 49 57 52 46 55 51 46 49 52 54 162 100 99 170 112 100 48 48 49 49 45 117 97 116 165 98 117 105 108 100 165 49 46 52 46 49 164 112 111 114 116 164 52 54 52 55 166 101 120 112 101 99 116 161 51] VS [255 138 168 114 97 102 116 95 118 115 110 161 51 165 98 117 105 108 100 165 49 46 52 46 49 163 118 115 110 161 49 162 100 99 170 112 100 48 48 49 49 45 117 97 116 162 105 100 218 0 36 50 48 57 98 50 56 98 49 45 100 52 98 97 45 49 49 49 97 45 57 51 98 52 45 53 52 51 51 99 55 54 99 97 53 52 100 168 114 112 99 95 97 100 100 114 173 49 48 46 49 57 52 46 55 51 46 49 52 54 164 112 111 114 116 164 52 54 52 55 166 101 120 112 101 99 116 161 51 164 114 111 108 101 165 110 111 109 97 100 166 114 101 103 105 111 110 166 101 117 114 111 112 101]), vsn:([1 5 2 2 5 4] VS [1 5 2 2 5 4])
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.197+0200 [INFO]  nomad: serf: EventMemberJoin: uat-nomad-Kxk.europe 10.194.74.253
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.197+0200 [INFO]  nomad: adding server: server="uat-nomad-tMd.europe (Addr: 10.194.75.103:4647) (DC: pd0011-uat)"
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.197+0200 [INFO]  nomad: adding server: server="uat-nomad-Kxk.europe (Addr: 10.194.74.253:4647) (DC: pd0011-uat)"
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.197+0200 [INFO]  nomad: serf: Re-joined to previously known node: uat-nomad-Kxk.europe: 10.194.74.253:4648
Oct 20 17:06:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:54.355+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=55231009-66eb-497c-2980-d6f2e02de791 error=<nil>
Oct 20 17:06:55 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:55.364+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:06:56 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:56.368+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:06:57 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:57.373+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:06:58 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:58.377+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:06:59 uat-nomad-95I nomad[5241]:     2022-10-20T17:06:59.382+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:00 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:00.386+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:01.391+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:02.395+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:03 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:03.400+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:04 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:04.405+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:05 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:05.409+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:05 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:05.864+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=b8c5ba8b-a8c4-6339-d898-203aacac6835 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:05 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:05.864+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:06 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:06.414+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:06 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:06.867+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=b8c5ba8b-a8c4-6339-d898-203aacac6835 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:06 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:06.867+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:07 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:07.418+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:08 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:08.422+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:09 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:09.428+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:10 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:10.432+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:11 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:11.436+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:12 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:12.441+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:13 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:13.446+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:14 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:14.450+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:15 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:15.454+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:16 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:16.459+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:17 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:17.464+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:18 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:18.468+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:18 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:18.890+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=38e612e5-c8fd-8ed7-2609-c08a04647154 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:18 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:18.890+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:19 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:19.239+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=5816ec5a-9562-e659-018b-568fd995bcd5 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:19 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:19.239+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:19 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:19.472+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:20 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:20.243+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=5816ec5a-9562-e659-018b-568fd995bcd5 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:20 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:20.243+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:20 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:20.477+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:21 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:21.481+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:22 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:22.486+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:23 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:23.491+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:24 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:24.496+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:25 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:25.501+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:26 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:26.505+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:26 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:26.871+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=b8c5ba8b-a8c4-6339-d898-203aacac6835 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:26 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:26.871+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:27 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:27.510+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:28 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:28.515+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:29 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:29.519+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:30 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:30.524+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:31 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:31.528+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:32 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:32.533+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:33 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:33.538+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:34 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:34.543+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:35 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:35.547+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:36 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:36.552+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:37 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:37.557+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:38 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:38.562+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:39 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:39.567+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:40 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:40.571+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:40 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:40.599+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=db52a78a-49e4-ce40-6b92-da6aad3a0246 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:40 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:40.599+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:41 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:41.575+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:41 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:41.603+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=db52a78a-49e4-ce40-6b92-da6aad3a0246 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:41 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:41.603+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:41 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:41.820+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=0e089a81-ab92-d515-c1ae-cea06122b86a error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:41 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:41.820+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:42 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:42.580+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:42 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:42.823+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=0e089a81-ab92-d515-c1ae-cea06122b86a error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:42 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:42.823+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:43 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:43.584+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:44 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:44.588+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:45 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:45.593+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:46 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:46.597+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:47 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:47.603+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:48 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:48.583+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=5c0dbc14-26d7-2b25-2721-e0ae940768de error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:48 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:48.583+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:48 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:48.607+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:49 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:49.587+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=5c0dbc14-26d7-2b25-2721-e0ae940768de error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:49 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:49.587+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:49 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:49.612+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:50 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:50.616+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:51 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:51.621+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:52 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:52.626+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:53 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:53.631+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:54 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:54.635+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:55 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:55.639+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:56 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:56.644+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:57 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:57.649+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:58 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:58.653+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:07:59 uat-nomad-95I nomad[5241]:     2022-10-20T17:07:59.657+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:00 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:00.662+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:01.606+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=db52a78a-49e4-ce40-6b92-da6aad3a0246 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:01.607+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:01.666+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:01.962+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=d7d0cb60-47dd-0d97-1265-30e184e599ad error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:01 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:01.962+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:02.671+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:02.826+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=0e089a81-ab92-d515-c1ae-cea06122b86a error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:02.826+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:02.966+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=d7d0cb60-47dd-0d97-1265-30e184e599ad error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:02 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:02.966+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:03 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:03.676+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:03 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:03.836+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 eval_id=2e334c81-9285-afdf-9915-accbe180b489 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:03 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:03.836+0200 [ERROR] worker: error invoking scheduler: worker_id=c5a91435-08a4-dbef-4513-99c91519d039 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:04 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:04.680+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:05 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:05.685+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:06 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:06.689+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:07 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:07.697+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:08 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:08.702+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:09 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:09.706+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:10 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:10.711+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:11 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:11.715+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:12 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:12.719+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:13 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:13.723+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:14 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:14.728+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:15 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:15.732+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:16 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:16.737+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:17 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:17.741+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:18 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:18.745+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:19 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:19.750+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:20 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:20.754+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:21 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:21.759+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:22 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:22.764+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:23 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:23.769+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:24 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:24.186+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=5343aaa3-5def-cd73-2a25-cce84f9105f7 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:24 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:24.186+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:24 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:24.774+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:25 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:25.190+0200 [ERROR] worker: failed to submit plan for evaluation: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 eval_id=5343aaa3-5def-cd73-2a25-cce84f9105f7 error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:25 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:25.190+0200 [ERROR] worker: error invoking scheduler: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="failed to process evaluation: rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:25 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:25.778+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:26 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:26.782+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:27 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:27.787+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:28 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:28.792+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:29 uat-nomad-95I systemd[1]: Stopping "HashiCorp Nomad - A workload orchestration solution"...
Oct 20 17:08:29 uat-nomad-95I nomad[5241]: ==> Caught signal: interrupt
Oct 20 17:08:29 uat-nomad-95I nomad[5241]: ==> Gracefully shutting down agent...
Oct 20 17:08:29 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:29.541+0200 [INFO]  nomad: server starting leave
Oct 20 17:08:29 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:29.796+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:30 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:30.195+0200 [INFO]  nomad: serf: EventMemberLeave: uat-nomad-95I.europe 10.194.73.146
Oct 20 17:08:30 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:30.195+0200 [INFO]  nomad: removing server: server="uat-nomad-95I.europe (Addr: 10.194.73.146:4647) (DC: pd0011-uat)"
Oct 20 17:08:30 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:30.801+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:31 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:31.805+0200 [ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb error="rpc error: no such key \"7d11bdf6-26f0-c4fa-5c04-b73b0f46eedb\" in keyring"
Oct 20 17:08:32 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:32.245+0200 [INFO]  agent: requesting shutdown
Oct 20 17:08:32 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:32.245+0200 [INFO]  nomad: shutting down server
Oct 20 17:08:32 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:32.245+0200 [ERROR] worker: error waiting for Raft index: worker_id=a2b9dd2d-8d9f-065e-b040-2d85951a7af0 error="context canceled" index=31516
Oct 20 17:08:32 uat-nomad-95I nomad[5241]:     2022-10-20T17:08:32.294+0200 [INFO]  agent: shutdown complete
Oct 20 17:08:32 uat-nomad-95I systemd[1]: nomad.service: Deactivated successfully.
Oct 20 17:08:32 uat-nomad-95I systemd[1]: Stopped "HashiCorp Nomad - A workload orchestration solution".
Oct 20 17:08:32 uat-nomad-95I systemd[1]: nomad.service: Consumed 2.503s CPU time.

bfqrst avatar Oct 20 '22 15:10 bfqrst

From that goroutine dump I see a couple of important bits of information:

  • There is only one goroutine anywhere in encrypter.go, which means we're not deadlocked on any of the mutexes that the encrypter.go code owns.
  • That one goroutine is in select at encrypter.go#L438, which is where we're rate-limiting the state store queries to 10/second to make sure we don't hammer the state store too often. I'm noting that the select doesn't say something like [select, 2 minutes] that would indicate it's a long-lived select (in rate.go#L2565):
goroutine 91 [select]:
runtime.gopark(0x40000d0cd0?, 0x2?, 0x98?, 0xb?, 0x40000d0c8c?)
        runtime/proc.go:363 +0xe4 fp=0x40000d0b10 sp=0x40000d0af0 pc=0x43f144
runtime.selectgo(0x40000d0cd0, 0x40000d0c88, 0x449f680?, 0x0, 0x65efc40b3946?, 0x1)
        runtime/select.go:328 +0x6b4 fp=0x40000d0c20 sp=0x40000d0b10 pc=0x44ffb4
golang.org/x/time/rate.(*Limiter).WaitN(0x40008ef9f0, {0x2edd2d8, 0x4000828280}, 0x1)
        golang.org/x/[email protected]/rate/rate.go:256 +0x2f4 fp=0x40000d0d90 sp=0x40000d0c20 pc=0xc37534
golang.org/x/time/rate.(*Limiter).Wait(...)
        golang.org/x/[email protected]/rate/rate.go:216
github.com/hashicorp/nomad/nomad.(*KeyringReplicator).run(0x400082a150, {0x2edd2d8, 0x4000828280})
        github.com/hashicorp/nomad/nomad/encrypter.go:438 +0x27c fp=0x40000d0fa0 sp=0x40000d0d90 pc=0x194cafc
github.com/hashicorp/nomad/nomad.NewKeyringReplicator.func1()
        github.com/hashicorp/nomad/nomad/encrypter.go:410 +0x30 fp=0x40000d0fd0 sp=0x40000d0fa0 pc=0x194c840
runtime.goexit()
        runtime/asm_arm64.s:1172 +0x4 fp=0x40000d0fd0 sp=0x40000d0fd0 pc=0x471174
created by github.com/hashicorp/nomad/nomad.NewKeyringReplicator
        github.com/hashicorp/nomad/nomad/encrypter.go:410 +0x180

Next step for me is to go dig into that rate limiting library and make sure we're not misusing it in some way. I'll report back here later today.

tgross avatar Oct 20 '22 15:10 tgross

I've got a draft PR open with one fix I see we need, but I don't think that's at all the cause here: https://github.com/hashicorp/nomad/pull/14987. Investigation continues!

tgross avatar Oct 20 '22 16:10 tgross

Just wanted to give a summary of where this is at. https://github.com/hashicorp/nomad/pull/14987 fixes some real bugs I've found while investigating, but I'm not 100% sure that it's the fix we need until I can reproduce and so far I've been unable to. Not having that failed to fetch key from any peer log on servers until after that restart is weird and makes me hesitate to say we're done here.

I'm going to pick this up again tomorrow and consult with some of my colleagues again and see what we can come up with.

tgross avatar Oct 20 '22 21:10 tgross

Appreciate the update and the investigation! For the time being, as I have a non working cluster, what's the best course of action? Since I thankfully don't use any of the new 1.4 features, I'm inclined to roll everything back to 1.3.6...

That should work, right?

bfqrst avatar Oct 20 '22 21:10 bfqrst

nomad operator root keyring rotate solves the issue, although it seems it may re-appear if a leader election occurs.

doppelc avatar Oct 21 '22 07:10 doppelc

Since I thankfully don't use any of the new 1.4 features, I'm inclined to roll everything back to 1.3.6... That should work, right?

Once any new raft log type has been written, you can't downgrade. That include the keyring metadata in this case. 😦

nomad operator root keyring rotate solves the issue

Yes, that will force a rotation and re-replication of the key.

, although it seems it may re-appear if a leader election occurs.

Have you seen that happen without adding new servers @doppelc? Because if so that's a useful new data point for my investigation.

tgross avatar Oct 21 '22 13:10 tgross

Yeah I had to redo the cluster in question, @doppelc response came after I started cleaning up.

It's kinda funny from a reproducibility standpoint really. It literally hit me in each of my three stages when I went up to 1.4.x. This time around is the fourth occurrence. At first I thought that it was the bug that was in 1.4.0 and brushed it off...

For now, good to know that I'm able to help myself to some extent with the nomad operator root keyring rotate command.

bfqrst avatar Oct 21 '22 13:10 bfqrst

Have you seen that happen without adding new servers? Because if so that's a useful new data point for my investigation.

Yes, it has happened in both scenarios.

doppelc avatar Oct 21 '22 15:10 doppelc

Based on some feedback from the internal user, I've got a clue that the underlying issue here is actually related to a bug in garbage collection, which is why I wasn't able to reproduce it with fresh test clusters. I'm working on verifying that and I'm hoping to have a fix for that later this afternoon.

tgross avatar Oct 21 '22 16:10 tgross

GC behavior fixes are https://github.com/hashicorp/nomad/pull/15009. I still haven't been able to reproduce the reported bug exactly, but this at least explains where the extraneous keys were coming from: not from leader election but from misbehaving GC. That gives me a lot more confidence in the fix in #14987 even if I don't have an exact reproduction yet.

tgross avatar Oct 21 '22 18:10 tgross

While I think the GC is triggered by default every 48 hours or so, I have the habit of manually firing nomad system gc fairly regularly to clean up old stuff... While this doesn't mean anything in terms of repro, I think you might be on to something @tgross ...

bfqrst avatar Oct 21 '22 18:10 bfqrst

I have the same issue. After upgrading from 1.3.6 to 1.4.1 it happened right after the upgrade on two separate clusters. It's also happening at seemingly random times, like the middle of the night yesterday.

madsholden avatar Oct 25 '22 11:10 madsholden

We've landed our fixes for this and that'll go out in Nomad 1.4.2, which I'm in the process of prepping. Expect this shortly, and thanks for your patience all!

tgross avatar Oct 26 '22 20:10 tgross

Several keyring fixes have shipped in Nomad 1.4.2 which should put this issue to rest. I'm going to close this out but please feel free to let us know here if you're encountering the issue again! Thanks!

tgross avatar Oct 27 '22 17:10 tgross

Hi folks, I'm going to re-open this issue. We've had a report from one of our internal users that they hit this again when upgrading from 1.4.1 to 1.4.2. It's at least possible that the state from 1.4.1 is "poisoned" in such a way that they won't hit it in their next 1.4.2 to 1.4.2 server recycle, but we want to make sure folks are aware that we're aware of this. I'm working with the team here to debug and I'll have more to report back early next week.

tgross avatar Oct 28 '22 19:10 tgross

Had a bit of a mid-weekend a-ha on this. When the keyring replicator queries the state store at encrypter.go#L458, it always receives the keys sorted in-order by KeyID. But if it fails to get any key, it jumps to the ERR_WAIT block at encrypter.go#L504-L508. This means that a single missing key will prevent any other keys from being replicated. If you've got a cluster that was previously hitting this bug, you'll continue to hit the bug ever after the 1.4.2 patch -- depending on the order of the KeyIDs (so if you have a missing key starting with 1, it'll prevent ~15 out of 16 keys from replicating, vs if your missing key starts with f it'll prevent ~1 in 16). This can also prevent a key rotation from showing up on the followers, which is what I suspect has happened with the internal user reporting this problem. I should be able to reproduce this hypothesis pretty easily when I pick this up on Monday.

(Having a missing inactive key is a serious problem, because the workload identities signed with an inactive key still need to be verifiable. But only a missing active key is a show-stopper for users who aren't using the new Variables feature, because that means we can't sign new identities and the plan applier just throws errors as we're seeing in this issue.)

tgross avatar Oct 29 '22 20:10 tgross

I'm working up a patch that does the following:

  • In replication, fix the early-exit-on-error problem described above.
  • Adds tracking for which keys have been used to sign live Workload Identities by adding an index to the state store.
  • In replication, don't bother replicating keys in the Deprecated state.
  • Modify the full re-key job so that keys used to sign live Workload Identities are marked as Inactive, not Deprecated.
  • Re-enable garbage collection to deletes keys that are not in use or are marked as Deprecated.

Workaround in the meantime

If you've upgraded to 1.4.2 but are missing keys b/c of the previous version and aren't yet using Variables, you should be able to workaround (for now) with the following:

  • Rotate the keyring with nomad operator root keyring rotate
  • List the inactive keys with nomad operator root keyring list -verbose
  • Delete the inactive keys with nomad operator root keyring remove $keyid.

If you are using Variables, you'll need to do a full rekey, but then all the tasks that are reading your Variables will need a reschedule. So for those cases the workaround is much more painful:

  • Rotate the keyring with nomad operator root keyring rotate
  • Reschedule tasks using Variables
  • List the inactive keys with nomad operator root keyring list -verbose
  • Delete the inactive keys with nomad operator root keyring remove $keyid.

(Alternately for non-production workloads you could just rekey with nomad operator root keyring rotate -full and then when you delete the inactive keys, the workloads will just start failing when they read Variables and get rescheduled anyways.)

tgross avatar Oct 31 '22 18:10 tgross

Draft PR is up https://github.com/hashicorp/nomad/pull/15092

tgross avatar Oct 31 '22 20:10 tgross

@tgross I assume

  • List the inactive keys with nomad operator list -verbose

should probably read nomad operator root keyring list -verbose

bfqrst avatar Nov 01 '22 08:11 bfqrst

Oops, typo there, yes! Fixed!

tgross avatar Nov 01 '22 12:11 tgross

Ok, so https://github.com/hashicorp/nomad/pull/15092 has been merged. I'm going to get this into the hands of our internal users for them to do some testing on, and if that works out it'll land in the next release (TBD).

tgross avatar Nov 01 '22 19:11 tgross

Hey @tgross unfortunately we can still see "no such key ..." errors after Upgrading to 1.4.2. We had to execute nomad operator root keyring rotate on the cluster for deployments to work again, variables are all inaccessible again.

HINT-SJ avatar Nov 02 '22 13:11 HINT-SJ

Right @HINT-SJ we're aware that the problem can recur in 1.4.2, which is why we've got a patch being tested (see my comment here: https://github.com/hashicorp/nomad/issues/14981#issuecomment-1298979092)

tgross avatar Nov 02 '22 13:11 tgross

What is the interim solution here? Downgrading to pre-1.4? Today this happened in our cluster, running 1.4.2, and jobs stopped being evaluated. I rotated the keys just as @HINT-SJ but it's obviously not ideal production setup.

Can we turn this feature off somehow with settings?

Fuco1 avatar Nov 06 '22 10:11 Fuco1

Here's the current status on this issue:

  • We've landed a patch that will go out in Nomad 1.4.3, which should fix all the known problems.
  • However, our internal user is reporting that keys are never getting written to the on-disk keystore, even after a key rotation on a healthy and unchanging cluster (even on the leader!). The trouble with this is that we don't know why b/c of an issue they're having with our log shipping going on in their staging environment. 😊
  • I'm hoping to get logs from them to assess what the "writing to disk" problem is (it may be specific to their machine image... I can't reproduce this anywhere else).

What is the interim solution here? Downgrading to pre-1.4? Today this happened in our cluster, running 1.4.2, and jobs stopped being evaluated. I rotated the keys just as @HINT-SJ but it's obviously not ideal production setup.

There are a number of interim solutions, in increasing order of complexity/annoyance:

  1. Rotate the keys after a leader election. Wait for Nomad 1.4.3 to come out.
  2. Downgrade to pre-1.4: note that a snapshot from 1.4.x can't be loaded into a 1.3.x cluster b/c of new raft log types, so you'll be rebuilding the cluster (which is probably ok if you've deployed to a staging environment first).
  3. Build yourself a Nomad off of main and run that: this would also help us validate the 1.4.3 patches.
  4. Build yourself a Nomad off of 1.4.2 with the alloc identity signing removed from the plan applier. Note: this will break template rendering of Nomad native service discovery and after 1.4.3 you'll need to reschedule all your workloads if you want them to have identities.

If you've got a staging/test environment where you can do (3) while you're doing (1), that'd be helpful too!

Can we turn this feature off somehow with settings?

The keyring is used for providing template blocks with workload identity signing, so it can't be turned off without breaking other features like template rendering for Nomad native service discovery and we didn't want to expose that knob. (Generally speaking, WIs are going to get used more widely in Nomad for future features so it's better to keep this available and fix the bug rather than have it be something folks can turn off and then be unhappy when they don't get the features they expect.)

tgross avatar Nov 07 '22 13:11 tgross

I did (2) killing the whole prod cluster :D :man_facepalming: Took me only about an hour to get it up so not a big deal. (1) unfortunately stopped working after about 4 or 5 rotations. I listed the keys and tried to remove the old ones but I wasn't able to, the remove command did nothing. In the logs I saw a lot of messages about leaders trying to load one of the old (inactive) keys instead of the new active one.

We currently only have one nomad deployment, only occasionally I setup a simple dev cluster when playing with things, but in general no workloads run there so it wouldn't test much.

For now I downgraded to the latest 1.3.x release, but man I'm missing 1.4.x new UI improvements.

Fuco1 avatar Nov 09 '22 12:11 Fuco1