neonKUBE issues

Large cluster deployment with 8 GiB RAM fails

2

This appears to be a cluster advice issue. In this case, the **tempo-ingester** pods are not able to be scheduled: ``` Name: tempo-ingester-0 Namespace: neon-monitor Priority: 900000000 Priority Class Name:...

jefflill

bug

neon-kube

cluster-setup

kubelet complaining: Nameserver limits exceeded

2

We're seeing this log line serveal times from the **kubelet** service: ``` Nov 16 21:02:17 control-0 kubelet[7914]: {"ts":1700168537196.4094,"caller":"dns/dns.go:153","msg":"Nameserver limits exceeded","err":"Nameserver limits were exceeded, some nameservers have been omitted, the applied...

jefflill

bug

neon-kube

cluster-setup

cilium/istio followup

Here are some things to follow up on for the cilium/istio changes: - [ ] evaluate using the Istio [Telemetry](https://istio.io/latest/docs/tasks/observability/telemetry/) resource - [ ] configure resource limits from cluster advice...

jefflill

kubernetes-csi validating webhook

1

It looks like we need to install a snap-shotting validating webhook for the latest CSI release. This helps with migration from beta Kubernetes APIs. https://github.com/kubernetes-csi/external-snapshotter?tab=readme-ov-file#validating-webhook https://github.com/kubernetes-csi/external-snapshotter/blob/master/deploy/kubernetes/webhook-example/README.md I'm going to set...

jefflill

zalan-postgres-health-check container image: do we still need this?

3

During the Kubernetes v1.29 upgrade, I noticed that we build the **zalan-postgres-heralth-check** container image. It doesn't look like we reference this anywhere. I'm going to delete this now. We should...

jefflill

Air-gapped clusters and Kubernetes container images

I just noticed that we host all of the cluster container images **except for Kubernetes images** in Harbor. We use **kubeadm init** to install the Kubernetes images into podman/CRI-O while...

jefflill

bug

neon-kube

cluster-setup

Evaluate: Pod Security Admission

**PodSecurityPolicy** has been removed since Kubernetes v1.25. We should look into setting Pod Security Admission for cluster namespaces. https://kubernetes.io/docs/concepts/security/pod-security-admission/

jefflill

investigate

security

hyper-v RAM resource check not working

I tried deploying my home **hyper-v large cluster** that includes 6 nodes with 8 GiB each (48 GiB total). This fails when starting the 5th VM due to insufficient RAM....

jefflill

bug

neon-kube

cluster-setup

grafana-deployment pod logging alert configuration missing file error

I noticed this while trying to debug excessive CPU utilization by the **grafana-deployment* pod after a cluster restart. ``` logger=provisioning.alerting t=2023-08-20T15:44:20.607673937Z level=error msg="can't read alerting provisioning files from directory" path=/etc/grafana/provisioning/alerting...

jefflill

investigate

ETCD backup/restore & cluster upgrade

Some thoughts and links for these topics. --- I was out on a drive yesterday and pulled over to do some research on my phone, looking into ETCD backup/restore solutions...

jefflill

neon-kube

neonKUBE
neonKUBE copied to clipboard

Metadata

Large cluster deployment with 8 GiB RAM fails

kubelet complaining: Nameserver limits exceeded

cilium/istio followup

kubernetes-csi validating webhook

zalan-postgres-health-check container image: do we still need this?

Air-gapped clusters and Kubernetes container images

Evaluate: Pod Security Admission

hyper-v RAM resource check not working

grafana-deployment pod logging alert configuration missing file error

ETCD backup/restore & cluster upgrade

← Metadata

Owner

Metadata

neonKUBE neonKUBE copied to clipboard

Metadata

← Metadata

Owner

Metadata

neonKUBE
neonKUBE copied to clipboard