consul icon indicating copy to clipboard operation
consul copied to clipboard

consul auto-encryption fails on IPv6-only cluster

Open sinisterstumble opened this issue 3 years ago • 10 comments
trafficstars

Overview of the Issue

Client certificate distribution fails on a IPv6-only cluster using the auto-encryption method.

Consul info

Server info
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 3
        services = 3
build:
        prerelease =
        revision = 7bbad6fe
        version = 1.10.4
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr = [2a05:d014:d9e:c305:b0bc:d053:f019:ed77]:8300
        server = true
raft:
        applied_index = 9789
        commit_index = 9789
        fsm_pending = 0
        last_contact = 83.945604ms
        last_log_index = 9789
        last_log_term = 4
        last_snapshot_index = 0
        last_snapshot_term = 0
        latest_configuration = [{Suffrage:Voter ID:22da461e-212d-76e2-95e5-0fb99ae5097e Address:[2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd]:8300} {Suffrage:Voter ID:6d370695-dac6-5205-6070-325b3cff86a8 Address:[2a05:d014:d9e:c305:b0bc:d053:f019:ed77]:8300} {Suffrage:Voter ID:c4ed1b5a-859f-ca1a-f6dd-ab2555f8e7cf Address:[2a05:d014:d9e:c304:fc9f:38cd:a734:5022]:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 4
runtime:
        arch = arm64
        cpu_count = 2
        goroutines = 110
        max_procs = 2
        os = linux
        version = go1.16.10
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 4
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 7
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Operating system and Environment details

ARM64 Centos Stream 8 Nodes on AWS EC2

Log Fragments

systemd[1]: Started "HashiCorp Consul - A service mesh solution".
consul[12945]: ==> Starting Consul agent...
consul[12945]:            Version: '1.10.4'
consul[12945]:            Node ID: '38f1b91f-65a2-58a3-2e61-8ba9e11e5fb4'
consul[12945]:          Node name: 'i-0e39a8a2b3460b871'
consul[12945]:         Datacenter: 'eu-central-1' (Segment: '')
consul[12945]:             Server: false (Bootstrap: false)
consul[12945]:        Client Addr: [127.0.0.1 ::1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
consul[12945]:       Cluster Addr: 2a05:d014:d9e:c303:e4d0:6e78:b6d0:a903 (LAN: 8301, WAN: 8302)
consul[12945]:            Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true, Auto-Encrypt-TLS: true
consul[12945]: ==> Log data will now stream in as it occurs:
consul[12945]: 2021-12-15T03:41:57.224Z [WARN]  agent: "autopilot.redundancy_zone_tag" is a Consul Enterprise configuration and will have no effect
consul[12945]: 2021-12-15T03:41:57.225Z [WARN]  agent: "autopilot.disable_upgrade_migration" is a Consul Enterprise configuration and will have no effect
consul[12945]: 2021-12-15T03:41:57.249Z [WARN]  agent.auto_config: "autopilot.redundancy_zone_tag" is a Consul Enterprise configuration and will have no effect
consul[12945]: 2021-12-15T03:41:57.249Z [WARN]  agent.auto_config: "autopilot.disable_upgrade_migration" is a Consul Enterprise configuration and will have no effect
consul[12945]: 2021-12-15T03:41:57.270Z [INFO]  agent.auto_config: discover-aws: Region is eu-central-1
consul[12945]: 2021-12-15T03:41:57.270Z [INFO]  agent.auto_config: discover-aws: Filter instances with packer-aws-stack=packer-aws-stack-server
consul[12945]: 2021-12-15T03:41:57.463Z [INFO]  agent.auto_config: discover-aws: Instance i-09cc796081995cdc1 has IPv6 2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd on NetworkInterfaceId eni-0eb12a1d1acec6e61
consul[12945]: 2021-12-15T03:41:57.463Z [INFO]  agent.auto_config: discover-aws: Instance i-0cc65f132aaa635ba has IPv6 2a05:d014:d9e:c304:fc9f:38cd:a734:5022 on NetworkInterfaceId eni-0f8a76deed0c941fa
consul[12945]: 2021-12-15T03:41:57.463Z [INFO]  agent.auto_config: discover-aws: Instance i-09f5a19c8ee3f55cb has IPv6 2a05:d014:d9e:c305:b0bc:d053:f019:ed77 on NetworkInterfaceId eni-06c9484734c3d2e4d
consul[12945]: 2021-12-15T03:41:57.463Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd error="address 2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.463Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c304:fc9f:38cd:a734:5022 error="address 2a05:d014:d9e:c304:fc9f:38cd:a734:5022: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.463Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c305:b0bc:d053:f019:ed77 error="address 2a05:d014:d9e:c305:b0bc:d053:f019:ed77: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.463Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use
consul[12945]: 2021-12-15T03:41:57.463Z [INFO]  agent.auto_config: discover-aws: Region is eu-central-1
consul[12945]: 2021-12-15T03:41:57.464Z [INFO]  agent.auto_config: discover-aws: Filter instances with packer-aws-stack=packer-aws-stack-server
consul[12945]: 2021-12-15T03:41:57.514Z [INFO]  agent.auto_config: discover-aws: Instance i-09cc796081995cdc1 has IPv6 2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd on NetworkInterfaceId eni-0eb12a1d1acec6e61
consul[12945]: 2021-12-15T03:41:57.514Z [INFO]  agent.auto_config: discover-aws: Instance i-0cc65f132aaa635ba has IPv6 2a05:d014:d9e:c304:fc9f:38cd:a734:5022 on NetworkInterfaceId eni-0f8a76deed0c941fa
consul[12945]: 2021-12-15T03:41:57.514Z [INFO]  agent.auto_config: discover-aws: Instance i-09f5a19c8ee3f55cb has IPv6 2a05:d014:d9e:c305:b0bc:d053:f019:ed77 on NetworkInterfaceId eni-06c9484734c3d2e4d
consul[12945]: 2021-12-15T03:41:57.514Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd error="address 2a05:d014:d9e:c303:e4d3:d281:a61d:8ebd: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.514Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c304:fc9f:38cd:a734:5022 error="address 2a05:d014:d9e:c304:fc9f:38cd:a734:5022: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.514Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=2a05:d014:d9e:c305:b0bc:d053:f019:ed77 error="address 2a05:d014:d9e:c305:b0bc:d053:f019:ed77: too many colons in address"
consul[12945]: 2021-12-15T03:41:57.514Z [ERROR] agent.auto_config: no auto-encrypt server addresses available for use

sinisterstumble avatar Dec 15 '21 05:12 sinisterstumble

Thank you for the bug report! It seems like this may be a problem with the IP addresses returned by https://github.com/hashicorp/go-discover/tree/master/provider/aws

Consul uses https://pkg.go.dev/net#SplitHostPort, which says ipv6 addresses must be enclosed in square brackets. It seems like the go-discover AWS provider does not encode the addresses in that way.

dnephin avatar Dec 15 '21 18:12 dnephin

@dnephin @Amier3 forgot to mention that the manual/operator certificate distribution works as expected.

sinisterstumble avatar Dec 16 '21 05:12 sinisterstumble

May be unrelated, but seeing this messages in server logs. They don't seem to have any effect. #9241 is probably a related issue.

agent.server.memberlist.wan: memberlist: Failed to resolve i-09f5a19c8ee3f55cb.eu-central-1/2a05:d014:d9e:c305:b0bc:d053:f019:ed77:8302: lookup 2a05:d014:d9e:c305:b0bc:d053:f019:ed77:8302: no such host

sinisterstumble avatar Dec 16 '21 06:12 sinisterstumble

Hey @markmartirosian

Wanted to loop back on this and give you a bit of an update. The engineering team discussed this right before the holidays and we're working on a fix that'll resolve this issue in go-discover and memberlist for the AWS provider ( with fixes for later providers coming at a later date ).

We'll keep this issue open in the meantime

Amier3 avatar Jan 03 '22 17:01 Amier3

@Amier3 thank you for the update!

sinisterstumble avatar Jan 03 '22 23:01 sinisterstumble

It isn't related to only aws. Same issue with fixed retry join list:

retry_join = ["[add1]", "[addr2]", "[addr3]", "[addr4]", "[add5]"]

Result:

2022-02-03T21:13:37.983Z [WARN]  agent.auto_config: IP resolution failed: host=[addr1] error="lookup [addr1]: no such host"
2022-02-03T21:13:37.983Z [WARN]  agent.auto_config: IP resolution failed: host=[addr2] error="lookup [addr2]: no such host"
2022-02-03T21:13:37.983Z [WARN]  agent.auto_config: IP resolution failed: host=[addr3] error="lookup [addr3]: no such host"
2022-02-03T21:13:37.983Z [WARN]  agent.auto_config: IP resolution failed: host=[addr4] error="lookup [addr4]: no such host"
2022-02-03T21:13:37.983Z [WARN]  agent.auto_config: IP resolution failed: host=[addr5] error="lookup [addr5]: no such host"
2022-02-03T21:13:37.983Z [ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request

When retry join set with port:

retry_join = ["[add1]:8301", "[addr2]:8301", "[addr3]:8301", "[addr4]:8301", "[add5]:8301"]

Result:

2022-02-03T21:14:21.809Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=addr1 error="address addr1: too many colons in address"
2022-02-03T21:14:21.809Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=addr2 error="address addr2: too many colons in address"
2022-02-03T21:14:21.809Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=addr3 error="address addr2: too many colons in address"
2022-02-03T21:14:21.809Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=addr4 error="address addr4: too many colons in address"
2022-02-03T21:14:21.809Z [WARN]  agent.auto_config: error splitting host address into IP and port: address=addr5 error="address addr5: too many colons in address"
2022-02-03T21:14:21.809Z [ERROR] agent.auto_config: No servers successfully responded to the auto-encrypt request

valodzka avatar Feb 03 '22 21:02 valodzka

issue still present in Consul version 1.15.1

tesinormed avatar Mar 21 '23 04:03 tesinormed

I think the issue should be renamed to reflect the actual problem here: Consul can't parse IPv6 addresses in retry_join. It's 2023, IPv4s are running out. I believe this issue deserves some priority.

svenstaro avatar Apr 04 '23 14:04 svenstaro

This is happening to me only when new consul clients try to join a cluster that uses ipv6 only.

I have a 3 nodes test cluster with ipv6 and retry_join works as expected (building the server and bootstrapping ACL, etc) but now that I started to add a client I got stock with this.

nbari avatar Apr 14 '23 12:04 nbari

I got involved in this discussion on the forum: https://discuss.hashicorp.com/t/ipv6-agent-auto-config-error-splitting-host-address-into-ip-and-port-address-x-error-address-x-too-many-colons-in-address/52754/6 and as a result have opened the PR above, to fix this bug.

maxb avatar Apr 15 '23 22:04 maxb