k0s icon indicating copy to clipboard operation
k0s copied to clipboard

k0s-pushgateway missing metrics from Control Planes after a while

Open Skaronator opened this issue 1 year ago • 7 comments

Before creating an issue, make sure you've checked the following:

  • [X] You are running the latest released version of k0s
  • [X] Make sure you've searched for existing issues, both open and closed
  • [X] Make sure you've searched for PRs too, a fix might've been merged already
  • [X] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

fin-kubm-vm-01:~$ uname -srvmo; cat /etc/os-release || lsb_release -a

Linux 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.29.2+k0s.0

Sysinfo

`k0s sysinfo`
$ sudo k0s sysinfo
Machine ID: "7f250da9878c8d1542136402b43ec42dd8d5b0a83de8889fac4f4cabb545b7cc" (from machine) (pass)
Total memory: 3.8 GiB (pass)
Disk space available for /var/lib/k0s: 33.5 GiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.15.0-102-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I'm running a cluster of 3 control planes and 3 worker nodes. The cluster is already ancient with 2y82d, but it's working flawless.

Recently, we enabled System components monitoring by adding the --enable-metrics-scraper argument to our 3 control plane k0s controller. We got the k0s-system namespace with the push gateway and metrics seem to reach our Prometheus.

After taking a closer look, we realized that not all metrics are being received.

image

Taking a closer look, what happened yesterday: image

You can see that initially all etcd metrics are being received. Interestingly, node3 was not restarted but now started to send metrics again. I have no idea why, the worker nodes didn't get restarted, so the pushgateway pod didn't reschedule.

When doing curl against the pushgateway endpoint I can see different number of metrics for each machine. It seems like 03 return alle metrics while 01 has 90% missing (e.g. etcd completly missing). 02 pushed zero metrics.

$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-01 | wc -l
638
$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-02 | wc -l
0
$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-03 | wc -l
7050

Steps to reproduce

  1. Deploy 3 CP & 3 Nodes
  2. (wait 2 years?)
  3. Enable Metrics Scrape
  4. Verify that all metrics are beeing recieved.

Expected behavior

I expect that all metrics reach the push gateway

Actual behavior

Metrics are only partially available

Screenshots and logs

I searched in the Logs but didn't find anything useful or related to this.

For example, node 01 has 4 log entries for today that contain metrics:

fin-kubm-vm-01 $ cat /var/log/syslog | grep metrics | grep -v Grafana
Apr 10 10:55:24 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:24" level=error msg="error sending POST request for job kube-scheduler: no endpoints available for service \"http:k0s-pushgateway:http\"" component=metrics metrics_job=kube-scheduler
Apr 10 10:55:24 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:24" level=error msg="error sending POST request for job kube-controller-manager: no endpoints available for service \"http:k0s-pushgateway:http\"" component=metrics metrics_job=kube-controller-manager
Apr 10 10:55:24 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:24" level=error msg="error sending POST request for job etcd: no endpoints available for service \"http:k0s-pushgateway:http\"" component=metrics metrics_job=etcd
Apr 10 10:55:25 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:25" level=info msg="W0410 10:55:25.148724     972 aggregator.go:166] failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 10:55:25 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:25" level=info msg="W0410 10:55:25.602773     972 aggregator.go:166] failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 10:55:26 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:26" level=info msg="W0410 10:55:26.006288     972 aggregator.go:166] failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 10:55:26 fin-kubm-vm-01 k0s[696]: time="2024-04-10 10:55:26" level=info msg="E0410 10:55:26.446143     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again" component=kube-apiserver stream=stderr
Apr 10 12:10:27 fin-kubm-vm-01 k0s[696]: time="2024-04-10 12:10:27" level=info msg="E0410 12:10:27.702374     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" component=kube-apiserver stream=stderr
Apr 10 12:10:27 fin-kubm-vm-01 k0s[696]: time="2024-04-10 12:10:27" level=info msg="E0410 12:10:27.706869     972 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 12:10:32 fin-kubm-vm-01 k0s[696]: time="2024-04-10 12:10:32" level=info msg="E0410 12:10:32.705754     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" component=kube-apiserver stream=stderr
Apr 10 12:10:37 fin-kubm-vm-01 k0s[696]: time="2024-04-10 12:10:37" level=info msg="E0410 12:10:37.729808     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.101.57.31:443/apis/metrics.k8s.io/v1beta1\": http2: client connection lost" component=kube-apiserver stream=stderr
Apr 10 15:22:27 fin-kubm-vm-01 k0s[696]: time="2024-04-10 15:22:27" level=info msg="E0410 15:22:27.928776     972 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 15:22:27 fin-kubm-vm-01 k0s[696]: time="2024-04-10 15:22:27" level=info msg="E0410 15:22:27.941563     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again" component=kube-apiserver stream=stderr
Apr 10 15:22:37 fin-kubm-vm-01 k0s[696]: time="2024-04-10 15:22:37" level=info msg="E0410 15:22:37.941711     972 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 15:22:37 fin-kubm-vm-01 k0s[696]: time="2024-04-10 15:22:37" level=info msg="E0410 15:22:37.957649     972 available_controller.go:460] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.metrics.k8s.io\": the object has been modified; please apply your changes to the latest version and try again" component=kube-apiserver stream=stderr
Apr 10 17:48:58 fin-kubm-vm-01 k0s[696]: time="2024-04-10 17:48:58" level=info msg="E0410 17:48:58.526585     972 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 10 17:49:08 fin-kubm-vm-01 k0s[696]: time="2024-04-10 17:49:08" level=info msg="E0410 17:49:08.537255     972 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable" component=kube-apiserver stream=stderr
Apr 11 08:05:11 fin-kubm-vm-01 k0s[696]: time="2024-04-11 08:05:11" level=error msg="error sending POST request for job etcd: error trying to reach service: EOF" component=metrics metrics_job=etcd
Apr 11 08:05:11 fin-kubm-vm-01 k0s[696]: time="2024-04-11 08:05:11" level=error msg="error sending POST request for job kube-scheduler: error trying to reach service: EOF" component=metrics metrics_job=kube-scheduler
Apr 11 08:05:11 fin-kubm-vm-01 k0s[696]: time="2024-04-11 08:05:11" level=error msg="error sending POST request for job etcd: no endpoints available for service \"k0s-pushgateway\"" component=metrics metrics_job=etcd
Apr 11 08:05:11 fin-kubm-vm-01 k0s[696]: time="2024-04-11 08:05:11" level=error msg="error sending POST request for job kube-scheduler: no endpoints available for service \"k0s-pushgateway\"" component=metrics metrics_job=kube-scheduler

Additional context

No response

Skaronator avatar Apr 11 '24 08:04 Skaronator

After restarting the pushgateway pod, I got more metrics again:

$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-03 | wc -l
7050
$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-02 | wc -l
2811
$ curl -s http://localhost:9091/metrics | grep fin-kubm-vm-01 | wc -l
2808

image

Skaronator avatar Apr 11 '24 08:04 Skaronator

Are you still able to grab the logs for the previous push-gateway pod? As restarting that seemed to help, we'd be interested to see if there's anything in it's logs to hint what could've been sideways.

jnummelin avatar Apr 11 '24 08:04 jnummelin

Sorry, forgot to mention the pushgateway logs because there are basically no logs. Here are the logs for the last 7 days:

ts=2024-04-11T08:05:10.820Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
ts=2024-04-11T08:05:10.820Z caller=main.go:140 level=info listen_address=:9091
ts=2024-04-11T08:05:10.818Z caller=main.go:90 level=debug msg="path prefix for internal routing" path=
ts=2024-04-11T08:05:10.818Z caller=main.go:89 level=debug msg="path prefix used externally" path=
ts=2024-04-11T08:05:10.818Z caller=main.go:88 level=debug msg="external URL" url=
ts=2024-04-11T08:05:10.818Z caller=main.go:87 level=info build_context="(go=go1.19.6, user=root@buildkitsandbox, date=20230217-09:16:39)"
ts=2024-04-11T08:05:10.818Z caller=main.go:86 level=info msg="starting pushgateway" version="(version=1.4.0, branch=HEAD, revision=b28bd0363ed3112fc0c1d39813cdc1c1d335bdf1)"
ts=2024-04-11T08:05:10.258Z caller=main.go:200 level=error msg="HTTP server stopped" err="accept tcp [::]:9091: use of closed network connection"
ts=2024-04-11T08:05:10.257Z caller=main.go:252 level=info msg="received SIGINT/SIGTERM; exiting gracefully..."
ts=2024-04-10T10:44:40.402Z caller=main.go:200 level=error msg="HTTP server stopped" err="accept tcp [::]:9091: use of closed network connection"
ts=2024-04-10T10:44:40.402Z caller=main.go:252 level=info msg="received SIGINT/SIGTERM; exiting gracefully..."
ts=2024-04-08T15:44:20.344Z caller=push.go:111 level=debug msg="failed to parse text" source=10.244.2.155:52800 err="unexpected EOF"
ts=2024-04-05T09:02:50.466Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
ts=2024-04-05T09:02:50.464Z caller=main.go:140 level=info listen_address=:9091
ts=2024-04-05T09:02:50.458Z caller=main.go:90 level=debug msg="path prefix for internal routing" path=
ts=2024-04-05T09:02:50.458Z caller=main.go:89 level=debug msg="path prefix used externally" path=
ts=2024-04-05T09:02:50.458Z caller=main.go:88 level=debug msg="external URL" url=
ts=2024-04-05T09:02:50.458Z caller=main.go:87 level=info build_context="(go=go1.19.6, user=root@buildkitsandbox, date=20230217-09:16:39)"
ts=2024-04-05T09:02:50.457Z caller=main.go:86 level=info msg="starting pushgateway" version="(version=1.4.0, branch=HEAD, revision=b28bd0363ed3112fc0c1d39813cdc1c1d335bdf1)"
ts=2024-04-05T09:02:49.747Z caller=main.go:200 level=error msg="HTTP server stopped" err="accept tcp [::]:9091: use of closed network connection"
ts=2024-04-05T09:02:49.747Z caller=main.go:252 level=info msg="received SIGINT/SIGTERM; exiting gracefully..."
ts=2024-04-05T08:57:00.094Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
ts=2024-04-05T08:57:00.094Z caller=main.go:140 level=info listen_address=:9091
ts=2024-04-05T08:57:00.092Z caller=main.go:90 level=debug msg="path prefix for internal routing" path=
ts=2024-04-05T08:57:00.092Z caller=main.go:89 level=debug msg="path prefix used externally" path=
ts=2024-04-05T08:57:00.092Z caller=main.go:88 level=debug msg="external URL" url=
ts=2024-04-05T08:57:00.092Z caller=main.go:87 level=info build_context="(go=go1.19.6, user=root@buildkitsandbox, date=20230217-09:16:39)"
ts=2024-04-05T08:57:00.092Z caller=main.go:86 level=info msg="starting pushgateway" version="(version=1.4.0, branch=HEAD, revision=b28bd0363ed3112fc0c1d39813cdc1c1d335bdf1)"
ts=2024-04-05T08:02:29.514Z caller=push.go:111 level=debug msg="failed to parse text" source=10.244.1.189:44278 err="unexpected EOF"
ts=2024-04-05T08:02:29.511Z caller=push.go:111 level=debug msg="failed to parse text" source=10.244.2.173:58952 err="unexpected EOF"
ts=2024-04-05T08:02:29.509Z caller=push.go:111 level=debug msg="failed to parse text" source=10.244.0.34:57258 err="unexpected EOF"
ts=2024-04-04T12:19:01.405Z caller=push.go:111 level=debug msg="failed to parse text" source=10.244.2.173:42550 err="unexpected EOF"

There just 4 failed to parse error so I'd just ignore them and everything else is just the normal startup log.

Skaronator avatar Apr 11 '24 08:04 Skaronator

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] avatar May 11 '24 23:05 github-actions[bot]

This is not stale. I can give this another try with 1.30 in 2-3 weeks.

Skaronator avatar May 12 '24 05:05 Skaronator

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] avatar Jun 11 '24 23:06 github-actions[bot]

We did a disaster recovery of our 2.5 years old cluster. Well, actually used fresh Ubuntu 22.04 VMs and deployed everything new, so no backup involved and since then, this issue stabilized a bit, but it’s still missing the 3rd node.

image

The 3rd node was probably there at the beginning, but then dropped again after a few hours. (We deployed the monitoring stack last)

We are now using 1.30.0. pushgateway show nothing wrong.

Skaronator avatar Jun 13 '24 15:06 Skaronator

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] avatar Jul 13 '24 23:07 github-actions[bot]

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] avatar Aug 15 '24 23:08 github-actions[bot]

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] avatar Sep 20 '24 23:09 github-actions[bot]

It looks like this issue has been resolved itself. It’s now running since almost 4 weeks fine:

image

The only thing we might have changed (I don't have exact dates, but it was around that time +- 2 days) is our haproxy setup we use in front of k0s. We swichted from a single VM to two VMs with keepalived for a vritual IP. I don't think it should have impacted this but who knows.

Skaronator avatar Sep 23 '24 09:09 Skaronator