k0s
k0s copied to clipboard
tls error after system reboot
Before creating an issue, make sure you've checked the following:
- [x] You are running the latest released version of k0s
- [x] Make sure you've searched for existing issues, both open and closed
- [x] Make sure you've searched for PRs too, a fix might've been merged already
- [x] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.
Platform
Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
Version
v1.31.1+k0s.1
Sysinfo
`k0s sysinfo`
Total memory: 12.6 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 390.8 GiB (pass)
Relative disk space available for /var/lib/k0s: 85% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
Linux kernel release: 6.1.0-26-amd64 (pass)
Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
AppArmor: active (pass)
Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
Executable in PATH: mount: /usr/bin/mount (pass)
Executable in PATH: umount: /usr/bin/umount (pass)
/proc file system: mounted (0x9fa0) (pass)
Control Groups: version 2 (pass)
cgroup controller "cpu": available (is a listed root controller) (pass)
cgroup controller "cpuacct": available (via cpu in version 2) (pass)
cgroup controller "cpuset": available (is a listed root controller) (pass)
cgroup controller "memory": available (is a listed root controller) (pass)
cgroup controller "devices": available (device filters attachable) (pass)
cgroup controller "freezer": available (cgroup.freeze exists) (pass)
cgroup controller "pids": available (is a listed root controller) (pass)
cgroup controller "hugetlb": available (is a listed root controller) (pass)
cgroup controller "blkio": available (via io in version 2) (pass)
CONFIG_CGROUPS: Control Group support: built-in (pass)
CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
CONFIG_CPUSETS: Cpuset support: built-in (pass)
CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
CONFIG_NAMESPACES: Namespaces support: built-in (pass)
CONFIG_UTS_NS: UTS namespace: built-in (pass)
CONFIG_IPC_NS: IPC namespace: built-in (pass)
CONFIG_PID_NS: PID namespace: built-in (pass)
CONFIG_NET_NS: Network namespace: built-in (pass)
CONFIG_NET: Networking support: built-in (pass)
CONFIG_INET: TCP/IP networking: built-in (pass)
CONFIG_IPV6: The IPv6 protocol: built-in (pass)
CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
CONFIG_NETFILTER_NETLINK: module (pass)
CONFIG_NF_NAT: module (pass)
CONFIG_IP_SET: IP set support: module (pass)
CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
CONFIG_IP_VS: IP virtual server support: module (pass)
CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
CONFIG_NF_DEFRAG_IPV4: module (pass)
CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
CONFIG_NF_DEFRAG_IPV6: module (pass)
CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
CONFIG_LLC: module (pass)
CONFIG_STP: module (pass)
CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
CONFIG_PROC_FS: /proc file system support: built-in (pass)
What happened?
After I restart the OS I get the error below when running any kubectl command.
Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 127.0.1.1, 10.96.0.1, not 192.168.2.10
To fix it I had to run
sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start
again then kubectl works again.
This is my /etc/k0s/k0s.config file.
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: k0s
spec:
api:
address: 192.168.2.10
k0sApiPort: 9443
port: 6443
sans:
- 192.168.2.10
telemetry:
enabled: false
After the system I rebooted and kubectl throws tls error and I ran sudo k0s kubeconfig admin I saw that the sever cluster address is 127.0.0.1 instead of 192.168.2.10. Also, I don’t know where the 10.96.0.1 IP in the error message comes from.
Steps to reproduce
- Install k0s using confit above
- Generate kubeconfig file
- Reboot system
Expected behavior
K0s should retain the api.address from the installation config.
Actual behavior
K0s reverted the api address to 127.0.0.1 after system reboot instead of retaining the custom 192.168.2.10 api address.
Screenshots and logs
No response
Additional context
No response
Hi, we haven't seen this before so we believe this has to be triggered by something in your environment.
What happens if you reboot and instead of:
sudo k0s stop
sudo k0s install controller --single --force
sudo k0s start
You just do:
sudo k0s stop
sudo k0s start
IMPORTANT, don't do this immediately after the reboot, give it some time, maybe 5 minutes after the reboot because we suspect it may be a timing issue regarding network interfaces not being ready just yet.
Finally would it be possible to provide k0s logs after the reboot?
FWIW, I installed latest k0s on Sunday. Rebooted my masternode today and ran into this issue.
k0sctl version
version: v0.19.2
commit: 081dfeb
Cluster definition
kind: Cluster
metadata:
name: kruzter
spec:
hosts:
- ssh:
address: 192.168.55.248
user: root
port: 22
role: controller
- ssh:
address: 192.168.55.251
user: root
port: 22
role: worker
- ssh:
address: 192.168.55.252
user: root
port: 22
role: worker
- ssh:
address: 192.168.55.253
user: root
port: 22
role: worker
- ssh:
address: 192.168.55.254
user: root
port: 22
role: worker
k0s:
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: k0s
spec:
api:
k0sApiPort: 9443
port: 6443
installConfig:
users:
etcdUser: etcd
kineUser: kube-apiserver
konnectivityUser: konnectivity-server
kubeAPIserverUser: kube-apiserver
kubeSchedulerUser: kube-scheduler
konnectivity:
adminPort: 8133
agentPort: 8132
network:
kubeProxy:
disabled: false
mode: iptables
kuberouter:
autoMTU: true
mtu: 0
peerRouterASNs: ""
peerRouterIPs: ""
podCIDR: 10.244.0.0/16
provider: custom
serviceCIDR: 10.96.0.0/12
podSecurityPolicy:
defaultPolicy: 00-k0s-privileged
storage:
type: etcd
telemetry:
enabled: true
FWIW, it did get fixed after I did:
k0s stop
k0s start
After a few mins (all the time it took me to edit this post :) ) all the nodes became available on their own:
> k get nodes
NAME STATUS ROLES AGE VERSION
k8sworker1 Ready <none> 25h v1.31.1+k0s
k8sworker2 Ready <none> 25h v1.31.1+k0s
k8sworker3 Ready <none> 25h v1.31.1+k0s
k8sworker4 Ready <none> 25h v1.31.1+k0s
This really sounds like a timing issue, k0s maybe starts before the network has assigned the address to interface(s).
In which infra are you guys seeing this?
The k0s generated systemd unit does have dependency on network-online target:
After=network-online.target
Wants=network-online.target
Maybe in your case that is not enough for some reason. 🤔
To test if that is the case, you could try to add some ExecStartPre to dump out the interface info before k0s actually starts. Something like:
ExecStartPre=-/usr/sbin/ip a s > /root/ip.info
That could give us some hints if this is actually the case.
You can also look for the critical chain of services with something like:
systemd-analyze critical-chain k0scontroller.service
That shows a tree in which order things got started in reboot. Note: you need to analyse this after the reboot, NOT after the manual restart of k0s.
This is for example what I see on Ubuntu:
root@mothership:/# systemd-analyze critical-chain k0scontroller.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
k0scontroller.service @10min 24.386s
└─basic.target @8.069s
└─sockets.target @8.068s
└─uuidd.socket @8.062s
└─sysinit.target @8.036s
└─cloud-init.service @5.640s +2.390s
└─systemd-networkd-wait-online.service @3.806s +1.830s
└─systemd-networkd.service @3.731s +72ms
└─network-pre.target @3.727s
└─cloud-init-local.service @2.251s +1.474s
└─systemd-remount-fs.service @755ms +31ms
└─systemd-fsck-root.service @696ms +54ms
└─systemd-journald.socket @574ms
└─-.mount @471ms
└─-.slice @471ms
Also, I don’t know where the 10.96.0.1 IP in the error message comes from
@ckt114 That is the default cluster internal svc address for the API
@jnummelin I added the ExecStartPre to /etc/systemd/system/k0scontroller.service, daemon-reload, and rebooted my system, not k0s, but nothing output to /root/ip.info.
This is the hierarchy of my k0scontroller service.
k0scontroller.service +7ms
└─network-online.target @1.707s
└─network.target @1.707s
└─networking.service @1.573s +133ms
└─apparmor.service @1.550s +20ms
└─local-fs.target @1.550s
└─run-credentials-systemd\x2dtmpfiles\x2dsetup.service.mount @1.554s
└─local-fs-pre.target @207ms
└─keyboard-setup.service @163ms +43ms
└─systemd-journald.socket @159ms
└─-.mount @123ms
└─-.slice @123ms
This is my k0scontroller.service
[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s
After=network-online.target
Wants=network-online.target
[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStartPre=-/usr/sbin/ip a s > /root/ip.info 2>&1
ExecStart=/usr/local/bin/k0s controller --single=true
Environment=""
RestartSec=10
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always
[Install]
WantedBy=multi-user.target
but nothing output to /root/ip.info.
Seems that systemd does not like to write files like this. I should've tried to actually to run this myself as it actually fails:
Error: either "dev" is duplicate, or "/root/ip.info" is a garbage.
🤦
So just remove the file direction, what you'll get is the output of that command in the journal logs.
Your service hierarchy looks correct to me.
The issue is marked as stale since no activity has been recorded in 30 days