kubespray IPv6 in etcd config incorrect

Environment:

Cloud provider or hardware configuration: Bare metal server
OS (target systems) (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.15.0-163-generic x86_64
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Version of Ansible (ansible --version): ansible [core 2.12.2]
Version of Python (python --version): 3.8.10

Kubespray version (commit) (git rev-parse --short HEAD): 1c60a499

Network plugin used: weave

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): http://sprunge.us/5wakwe

Command used to invoke ansible: ANSIBLE_SSH_ARGS='-o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=240s -o ConnectTimeout=1200 -o ServerAliveInterval=30' ansible-playbook -i inventory/someproject/hosts.yml cluster.yml -b -v

Output of ansible run:

TASK [etcd : Configure | Ensure etcd is running] *****************************************************************************************************
fatal: [node1]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node3]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}

Anything else do we need to know: systemctl output

Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_KEY_FILE","variable-value":"/etc/ssl/etcd/ssl/member-node3-key.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PROXY","variable-value":"off"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_SNAPSHOT_COUNT","variable-value":"10000"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/usr/local/bin/etcd"]}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"warn","ts":"2022-06-13T13:03:51.079+0200","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"invalid value \"https://2a02:4a9:2a:276f::1:2380\" for ETCD_LISTEN_PEER_URLS: URL address does not have the form \"host:port\": https://2a02:4a9:2a:276f::1:2380"}
Jun 13 13:03:51 node3 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Jun 13 13:03:51 node3 systemd[1]: etcd.service: Failed with result 'exit-code'.
Jun 13 13:03:51 node3 systemd[1]: Failed to start etcd.

The problem is quite simple. It was only tested against legacy IP and the notation of IPv6 is wrong here. It has to be https://[2a02:4a9:2a:276f::1]:2380. Not sure if kubespray or etcd issue.

Jun 13 '22 11:06 Citrullin

I would be incline to say it's a kubespray and not etcd issue.

Maybe something to add/fix in https://github.com/kubernetes-sigs/kubespray/blob/4726a110fc6f6428a1caf04168ee9ded198a833f/roles/kubespray-defaults/defaults/main.yaml#L580

Jun 13 '22 15:06 floryut

etcd_address: "{{ ip | default(fallback_ips[inventory_hostname]) }}"

Wouldn't it be better to add it in the templates? https://github.com/kubernetes-sigs/kubespray/blob/4726a110fc6f6428a1caf04168ee9ded198a833f/roles/etcd/templates/etcd-events.env.j2#L7

Jun 13 '22 17:06 Citrullin

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 11 '22 17:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 11 '22 18:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Nov 10 '22 18:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 10 '22 18:11 k8s-ci-robot

kubespray kubespray copied to clipboard

IPv6 in etcd config incorrect

kubespray
kubespray copied to clipboard