kubespray
kubespray copied to clipboard
IPv6 in etcd config incorrect
Environment:
-
Cloud provider or hardware configuration: Bare metal server
-
OS (target systems) (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.15.0-163-generic x86_64
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
-
Version of Ansible (
ansible --version): ansible [core 2.12.2] -
Version of Python (
python --version): 3.8.10
Kubespray version (commit) (git rev-parse --short HEAD):
1c60a499
Network plugin used: weave
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
http://sprunge.us/5wakwe
Command used to invoke ansible:
ANSIBLE_SSH_ARGS='-o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=240s -o ConnectTimeout=1200 -o ServerAliveInterval=30' ansible-playbook -i inventory/someproject/hosts.yml cluster.yml -b -v
Output of ansible run:
TASK [etcd : Configure | Ensure etcd is running] *****************************************************************************************************
fatal: [node1]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node3]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
Anything else do we need to know: systemctl output
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_KEY_FILE","variable-value":"/etc/ssl/etcd/ssl/member-node3-key.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PEER_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_PROXY","variable-value":"off"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_SNAPSHOT_COUNT","variable-value":"10000"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_TRUSTED_CA_FILE","variable-value":"/etc/ssl/etcd/ssl/ca.pem"}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"info","ts":"2022-06-13T13:03:51.078+0200","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/usr/local/bin/etcd"]}
Jun 13 13:03:51 node3 etcd[25432]: {"level":"warn","ts":"2022-06-13T13:03:51.079+0200","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"invalid value \"https://2a02:4a9:2a:276f::1:2380\" for ETCD_LISTEN_PEER_URLS: URL address does not have the form \"host:port\": https://2a02:4a9:2a:276f::1:2380"}
Jun 13 13:03:51 node3 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Jun 13 13:03:51 node3 systemd[1]: etcd.service: Failed with result 'exit-code'.
Jun 13 13:03:51 node3 systemd[1]: Failed to start etcd.
The problem is quite simple. It was only tested against legacy IP and the notation of IPv6 is wrong here. It has to be
https://[2a02:4a9:2a:276f::1]:2380. Not sure if kubespray or etcd issue.
I would be incline to say it's a kubespray and not etcd issue.
Maybe something to add/fix in https://github.com/kubernetes-sigs/kubespray/blob/4726a110fc6f6428a1caf04168ee9ded198a833f/roles/kubespray-defaults/defaults/main.yaml#L580
etcd_address: "{{ ip | default(fallback_ips[inventory_hostname]) }}"
Wouldn't it be better to add it in the templates? https://github.com/kubernetes-sigs/kubespray/blob/4726a110fc6f6428a1caf04168ee9ded198a833f/roles/etcd/templates/etcd-events.env.j2#L7
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.