consul icon indicating copy to clipboard operation
consul copied to clipboard

cert-manager invoked even if connect doesn't enabled

Open kemko opened this issue 2 years ago • 2 comments
trafficstars

Overview of the Issue

After updating our consul server cluster from 1.9.5 to 1.14.3, we found some confusing behavior in the Connect and cert-manager bundle.

The following messages began to appear in the logs:

Jan 12 13:08:02 consul-primary1 consul[1480197]: 2023-01-12T13:08:02.338+0300 [TRACE] agent.server: rpc_server_call: method=Status.RaftStats errored=false request_type=read r>
Jan 12 13:08:03 consul-primary1 consul[1480197]: 2023-01-12T13:08:03.071+0300 [DEBUG] agent.server.cert-manager: ACLs have not finished initializing
Jan 12 13:08:03 consul-primary1 consul[1480197]: 2023-01-12T13:08:03.071+0300 [DEBUG] agent.server.cert-manager: CA has not finished initializing

There are 2 places in the documentation (the first and the second) describing the connect option. Together, as far as I understand, these 2 places mean that the connect.enabled option is disabled on the servers, and enabled on the clients by default. In order to turn on Connect in a cluster it is necessary to enable it in server config with the following option:

  "connect": {
   "enabled": true
  }

However, the above cert-manager messages on an instance that has "server": true (and option connect is not mentioned) in its config seem to mean that the condition a.config.PeeringEnabled && a.config.ConnectEnabled somehow becomes true on the server.

This behavior is also reproduced on a clean cluster installation on version 1.14.3.

The warning doesn't appear when directly turning off Consul Connect:

  "connect": {
   "enabled": false
  }

Reproduction Steps

  1. Create a cluster with 3 server nodes with the configuration below
  2. see the consul's logs for messages from cert-manager
Server config
{
 "acl": {
  "enabled": true,
  "default_policy": "deny",
 },
 "primary_datacenter": "primary",
 "bind_addr": "xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx",
 "bootstrap": false,
 "client_addr": "::",
 "data_dir": "/opt/consul",
 "datacenter": "primary",
 "disable_remote_exec": true,
 "encrypt": "[redacted]",
 "log_level": "TRACE",
 "enable_local_script_checks": true,
 "enable_debug": true,
 "limits": {
  "http_max_conns_per_client": 5000,
  "rpc_max_conns_per_client": 1500
 },
 "performance": {
  "raft_multiplier": 1
 },
 "raft_protocol": 3,
 "reconnect_timeout": "8h",
 "server": true,
 "start_join": [
  "consul-primary1",
  "consul-primary2",
  "consul-primary3"
 ],
 "start_join_wan": [
  "consul-primary1",
  "consul-primary2",
  "consul-primary3"
 ],
 "ui_config": {
  "enabled": true
 }
}

Consul info for Server

Server info
agent:
  check_monitors = 0
  check_ttls = 0
  checks = 2
  services = 2
build:
  prerelease =
  revision = bd257019
  version = 1.14.3
  version_metadata =
consul:
  acl = enabled
  bootstrap = false
  known_datacenters = 4
  leader = false
  leader_addr = [xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]:8300
  server = true
raft:
  applied_index = 1418
  commit_index = 1418
  fsm_pending = 0
  last_contact = 60.601155ms
  last_log_index = 1418
  last_log_term = 15
  last_snapshot_index = 0
  last_snapshot_term = 0
  latest_configuration = [{Suffrage:Voter ID:a461f95d-4f77-5a10-a3e3-129722f22e36 Address:[xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]:8300} {Suffrage:Voter ID:215dd10a-a8b5-5a7a-8bdf-5a516e95e54b Address:[xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]:8300} {Suffrage:Voter ID:adadbc07-281d-5fe7-9f4f-d188ec001915 Address:[xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx]:8300}]
  latest_configuration_index = 0
  num_peers = 2
  protocol_version = 3
  protocol_version_max = 3
  protocol_version_min = 0
  snapshot_version_max = 1
  snapshot_version_min = 0
  state = Follower
  term = 15
runtime:
  arch = amd64
  cpu_count = 8
  goroutines = 180
  max_procs = 8
  os = linux
  version = go1.19.4
serf_lan:
  coordinate_resets = 0
  encrypted = true
  event_queue = 0
  event_time = 14
  failed = 0
  health_score = 0
  intent_queue = 0
  left = 0
  member_time = 28428
  members = 8
  query_queue = 0
  query_time = 1
serf_wan:
  coordinate_resets = 0
  encrypted = true
  event_queue = 0
  event_time = 1
  failed = 0
  health_score = 0
  intent_queue = 0
  left = 0
  member_time = 178
  members = 18
  query_queue = 0
  query_time = 1

Operating system and Environment details

Ubuntu 20.04.5 LTS, x86_64

kemko avatar Jan 12 '23 13:01 kemko

The version-specific upgrade details page states that connect is enabled by default as of Consul 1.14: https://developer.hashicorp.com/consul/docs/upgrading/upgrade-specific#consul-1-14-x

As you noted, you can explicitly disable connect by setting connect.enabled = false.

Are you seeing any negative effects from this? Or just seeing the DEBUG level log message about cert-manager that you posted?

jkirschner-hashicorp avatar Jan 12 '23 14:01 jkirschner-hashicorp

Oh, sorry. I probably missed these changes when reading version-specific upgrades.

We do not see any negative effects, just a little scary log messages from parts of consul which we don't enabled. Anyway we can disable connect and peering in our config.

So with your comment it seems like documantation only issue.

In https://developer.hashicorp.com/consul/docs/connect/configuration#agent-configuration the connect feature are still defined as disabled by default. In https://developer.hashicorp.com/consul/docs/agent/config/config-files#connect-parameters are lack of information the connect feature is enabled on both agents and servers for now.

kemko avatar Jan 12 '23 15:01 kemko