consul icon indicating copy to clipboard operation
consul copied to clipboard

Use agent token for service/check deregistration during anti-entropy

Open pglass opened this issue 2 years ago • 0 comments

Description

The changes agent anti-entropy syncs to only use agent token for deregistration of services and checks.

The previous behavior had the agent attempt to use the "service" token (i.e. from the token field in a service definition file) and if that was not set, then it would use the agent token.

The previous behavior was problematic because, if the service token had been deleted, the deregistration request would fail. The agent would retry the deregistration during each anti-entropy sync, and the situation would never resolve.

The new behavior is to only/always use the agent token to service/check deregistration during anti-entropy. This is:

  • Simpler: No fallback logic to try different tokens
  • Faster (slightly): No time spent attempting the service token
  • Correct: The agent token is able to deregister services on that agent's node, because:
    • node:write permissions allow deregistration of services/checks on that node.
    • The agent token must have node:write permission, or else the agent is not be able to (de)register itself into the catalog

Testing & Reproduction steps

Expand for test steps
  • Define a service definition like the following (named wumbo). This contains a service and check with their token field set.

    $ cat config/service.hcl
    service {
      name = "wumbo"
      id = "wumbo-id"
      token = "33333333-22b6-43f1-88f0-4c49d2a63554"
    
      check {
        name = "inline check"
        ttl = "9999h"
        status = "passing"
      }
    }
    
    check {
      name = "standalone check"
      ttl = "9999h"
      status = "passing"
      service_id = "wumbo-id"
      token = "33333333-22b6-43f1-88f0-4c49d2a63554"
    } 
    
  • Start a consul agent

    $ cat config/agent.hcl
    log_level = "debug"
    node_name = "client1"
    leave_on_terminate = true
    
    acl = {
      default_policy = "deny"
      down_policy = "extend-cache"
      enable_token_persistence = true
      enabled = true
      tokens = {
        initial_management = "63fb8a77-22b6-43f1-88f0-4c49d2a63554"
        agent = "00000000-22b6-43f1-88f0-4c49d2a63554"
      }
    }
    
    $ consul agent -dev -config-dir ./config
    
  • Create the service and agent tokens (taking advantage of client provided ids):

    export CONSUL_HTTP_TOKEN=63fb8a77-22b6-43f1-88f0-4c49d2a63554
    consul acl token create -service-identity wumbo -secret 33333333-22b6-43f1-88f0-4c49d2a63554 -accessor 087a8e18-21a7-41e1-b952-878b606e750a
    consul acl token create -node-identity client1:dc1 -secret 00000000-22b6-43f1-88f0-4c49d2a63554 -accessor 8afcfcf1-c5f5-4c92-9470-f3f4763b3fb2
    
  • Verify the service is soon registered

    $ consul catalog services
    consul
    wumbo
    

    In the Consul agent logs, you should see:

    2023-01-27T13:26:44.963-0600 [INFO]  agent: Synced node info
    2023-01-27T13:26:44.964-0600 [INFO]  agent: Synced service: service=wumbo-id
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Check in sync: check=service:wumbo-id
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Check in sync: check="standalone check"
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Node info in sync
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Service in sync: service=wumbo-id
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Check in sync: check="standalone check"
    2023-01-27T13:26:44.964-0600 [DEBUG] agent: Check in sync: check=service:wumbo-id
    
  • Delete the service token (the agent should not use this token for the service/check deletion. this ensures we'll see failures if it does use the service token)

    $ consul acl token delete -id 087a8e18-21a7-41e1-b952-878b606e750a
    Token "087a8e18-21a7-41e1-b952-878b606e750a" deleted successfully
    
  • Deregister the service

    $ consul services deregister -id wumbo-id
    Deregistered service: wumbo-id
    
  • Verify the service is deregistered

    $ consul catalog services
    consul
    

    In the agent logs, you should see

    2023-01-27T13:35:23.064-0600 [DEBUG] agent: Node info in sync
    2023-01-27T13:35:23.065-0600 [INFO]  agent: Deregistered service: service=wumbo-id
    2023-01-27T13:35:23.065-0600 [DEBUG] agent: Node info in sync
    2023-01-27T13:35:23.065-0600 [DEBUG] agent: removed check: check="standalone check"
    2023-01-27T13:35:23.065-0600 [DEBUG] agent: removed check: check=service:wumbo-id
    2023-01-27T13:35:23.065-0600 [DEBUG] agent: removed service: service=wumbo-id
    2023-01-27T13:35:23.065-0600 [DEBUG] agent: Node info in sync
    

Links

This is a replacement/alternative to https://github.com/hashicorp/consul/pull/14436

PR Checklist

  • [ ] updated test coverage
  • [ ] external facing docs updated
  • [x] not a security concern

pglass avatar Jan 27 '23 19:01 pglass