consul icon indicating copy to clipboard operation
consul copied to clipboard

agent: Service deregistration blocked by ACLs

Open Rabbit-st opened this issue 1 year ago • 4 comments

Overview of the Issue

After upgrading from version 1.14.4 to version 1.16.2, there will be service health detection failures registered on some nodes every once in a while, and they will be restored to normal by restarting the consult server. image

Reproduction Steps

I don't know how to reproduce it, but it appears every once in a while.

Consul info for both Client and Server

Client info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 68f81912
        version = 1.16.2
        version_metadata = 
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = 10.60.112.132:8300
        server = true
raft:
        applied_index = 314534
        commit_index = 314534
        fsm_pending = 0
        last_contact = 70.541794ms
        last_log_index = 314534
        last_log_term = 37
        last_snapshot_index = 311338
        last_snapshot_term = 37
        latest_configuration = [{Suffrage:Voter ID:cc0f834e-8c67-d394-344f-ee5331ea663a Address:10.60.238.199:8300} {Suffrage:Voter ID:f5ec631a-488d-37de-b2b2-7914cd030996 Address:10.60.112.132:8300} {Suffrage:Voter ID:a72236a8-a966-bc19-578b-65021b8f12ca Address:10.60.97.199:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 37
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 144
        max_procs = 4
        os = linux
        version = go1.20.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 9
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 77
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 40
        members = 3
        query_queue = 0
        query_time = 1
Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 68f81912
        version = 1.16.2
        version_metadata = 
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = 10.60.112.132:8300
        server = true
raft:
        applied_index = 314534
        commit_index = 314534
        fsm_pending = 0
        last_contact = 70.541794ms
        last_log_index = 314534
        last_log_term = 37
        last_snapshot_index = 311338
        last_snapshot_term = 37
        latest_configuration = [{Suffrage:Voter ID:cc0f834e-8c67-d394-344f-ee5331ea663a Address:10.60.238.199:8300} {Suffrage:Voter ID:f5ec631a-488d-37de-b2b2-7914cd030996 Address:10.60.112.132:8300} {Suffrage:Voter ID:a72236a8-a966-bc19-578b-65021b8f12ca Address:10.60.97.199:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 37
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 144
        max_procs = 4
        os = linux
        version = go1.20.8
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 9
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 77
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 40
        members = 3
        query_queue = 0
        query_time = 1

Operating system and Environment details

Deploy using consult k8s

# consul-k8s status

==> Consul Status Summary
Name    Namespace       Status          Chart Version   AppVersion      Revision        Last Updated            
consul  consul          deployed        1.2.2           1.16.2          1               2023/10/12 10:32:19 CST

Log Fragments

2023-11-21T09:07:08.346Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:07:34.373Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:07:51.336Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:15.443Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:28.816Z [WARN]  agent: Node info update blocked by ACLs: node=cc0f834e-8c67-d394-344f-ee5331ea663a accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.131.181_80 accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.171.169_80 accessorID="anonymous token"
2023-11-21T09:08:28.817Z [WARN]  agent: Service deregistration blocked by ACLs: service=xxx_10.60.171.140_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.171.169_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.171.140_80 accessorID="anonymous token"
2023-11-21T09:08:28.818Z [WARN]  agent: Check deregistration blocked by ACLs: check=service:xxx_10.60.131.181_80 accessorID="anonymous token"
2023-11-21T09:08:33.416Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:08:54.231Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"
2023-11-21T09:09:23.554Z [WARN]  agent: Coordinate update blocked by ACLs: accessorID="anonymous token"

Rabbit-st avatar Nov 22 '23 02:11 Rabbit-st

Just want to gather more info to help us reproduce the issue:

  • are consul agents running in VM or K8s?
  • are the failed service instances (shown in the screenshot) registered at the server nodes?

huikang avatar Nov 22 '23 02:11 huikang

Just want to gather more info to help us reproduce the issue:

  • are consul agents running in VM or K8s?
  • are the failed service instances (shown in the screenshot) registered at the server nodes?
  1. Agents run on k8s.
  2. Sorry, I got it wrong before. The scenario where an exception occurs is when the service has stopped, but the consult server will not automatically unregister the stopped service.

Rabbit-st avatar Nov 22 '23 06:11 Rabbit-st

@Rabbit-st

Thanks for the updated info. Consul-k8s should handle deregistering service if you remove the service by kubectl delete

However, consul will not deregister the stopped service automatically since the service instance is stored in Consul's catalog. Consul won't route traffic to the failed instance, so the connection from downstream will be directed to healthy instances of the service.

Could you provide more details about the situation of stopped services? (is it stopped due to a true alarm or k8s node failure)

huikang avatar Nov 29 '23 22:11 huikang

I have the same problem, and I fixed, service need a agent token to regiester and deregister. https://developer.hashicorp.com/consul/docs/security/acl/tokens/create/create-an-agent-token

AiJiangnan avatar Mar 29 '24 04:03 AiJiangnan

I have the same problem, and I fixed, service need a agent token to regiester and deregister. https://developer.hashicorp.com/consul/docs/security/acl/tokens/create/create-an-agent-token

The token has been configured. Client issues, not supported consul 1.16.2. consul recovers after version degradation.

Rabbit-st avatar Jun 14 '24 07:06 Rabbit-st

@Rabbit-st Degraded to which version? Facing similar issues. Also please reopen this issue

MageshSrinivasulu avatar Aug 05 '24 11:08 MageshSrinivasulu