consul icon indicating copy to clipboard operation
consul copied to clipboard

ACL System crashes on RHEL on VMWare servers

Open notBscalE opened this issue 3 years ago • 0 comments

Overview of the Issue

We decided to upgrade our infrastructure slowly from 1.9.1 to 1.11. Our servers reside on VMWare Infrastructure, based on RHEL servers. Every time we tried migrating, the ACL system decided to crash, and the master token became invalid, while we can't view the status of our services and nodes. We also tried to reset the ACL System as showed in the outage troubleshooting, to no success.

We reproduced it by booting a new, seperate Consul cluster, and it showed the same result the moment we booted up the ACL System.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create a cluster, using Consul version 1.9.3 or above.
  2. Use the following configuration files: config.acl:
{
  "bind_addr": ">>bind ip<<",
  "addresses": {
     "dns": "0.0.0.0",
     "http": "0.0.0.0",
     "https": "0.0.0.0"
  },
  "ports": {
     "dns": 53,
     "http": 8500,
     "https": 8501,
     "serf_lan": 8301,
     "serf_wan": 8302,
     "server": 8300
  },
  "bootstrap": true,
  "server": true,
  "node_name": "consul-node1",
  "datacenter": "prod",
  "data_dir": "/var/consul",
  "encrypt": ">>key<<",
  "log_level": "INFO",
  "enable_syslog": true,
  "domain": "consul",
  "dns_config": {
     "allow_stale": "true",
     "max_stale": "30s",
     "service_ttl": {
          "*": "5s"
     },
     "node_ttl": "5s",
     "only_passing": true
  },
  "ui_config": {
     "enabled": true
   }
}

config_acl.json:

{
  "primary_datacenter": "prod",
  "acl": {
     "enabled": true,
     "down_policy": "deny",
     "default_policy": "deny",
     "tokens": {
         "master": ">>master token<<",
         "agent_master": ">>master token",
         "default": "anonymous"
      }
  }
}
  1. The log shows either the error ACL not found or Permission denied while trying to connect with the master token. Access to services and nodes that registered with Consul is permitted if the anonymous token has read access over the services and nodes on the cluster, but the UI won't show them.

Operating system and Environment details

OS: Red Hat Enterprise Linux 7.4 (although the bug reproduced in 8.x as well) The servers reside on an on-prem VMWare infrastructure.

notBscalE avatar Aug 10 '22 09:08 notBscalE