home-ops icon indicating copy to clipboard operation
home-ops copied to clipboard

Wife approved HomeOps driven by Kubernetes and GitOps using Flux

My home operations repository :octocat:

... managed with Flux, Renovate and GitHub Actions 🤖

Discord Kubernetes Pre-commit Renovate

Home-Internet Plex Home-Assistant Grafana


📖 Overview

This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.


⛵ Kubernetes

There's an excellent template over at onedr0p/flux-cluster-template if you wanted to try and follow along with some of the practices I use here.

Installation

My cluster is k3s provisioned overtop bare-metal Fedora Server using the Ansible galaxy role ansible-role-k3s. This is a semi hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server for (NFS) file storage.

🔸 Click here to see my Ansible playbooks and roles.

Core Components

GitOps

Flux watches my cluster folder (see Directories below) and makes the changes to my cluster based on the YAML manifests.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.

Directories

This Git repository contains the following directories (kustomizatons) under cluster.

📁 cluster      # k8s cluster defined as code
├─📁 flux       # flux, gitops operator, loaded before everything
├─📁 crds       # custom resources, loaded before 📁 core and 📁 apps
├─📁 charts     # helm repos, loaded before 📁 core and 📁 apps
├─📁 config     # cluster config, loaded before 📁 core and 📁 apps
├─📁 core       # crucial apps, namespaced dir tree, loaded before 📁 apps
└─📁 apps       # regular apps, categorized dir tree, loaded last

Networking

Name CIDR
Kubernetes Nodes 192.168.42.0/24
Kubernetes external services (Calico w/ BGP) 192.168.69.0/24
Kubernetes pods 10.42.0.0/16
Kubernetes services 10.43.0.0/16
  • HAProxy configured on Opnsense for the Kubernetes Control Plane Load Balancer.
  • Calico configured with externalIPs to expose Kubernetes services with their own IP over BGP which is configured on my router.

Data Backup and Recovery

Due to issues, restrictions or nuances with Velero, Benji, Gemini, Kasten K10 by Veeam, Stash by AppsCode and others I am currently using a DIY (or more specifically a "Poor Man's Backup") solution that is leveraging Kyverno, Kopia and native Kubernetes CronJob and Job resources.

At a high level the way this operates is that:

  • Kyverno creates a CronJob for each PersistentVolumeClaim resource that contain a label of snapshot.home.arpa/enabled: "true"
  • Everyday the CronJob creates a Job and uses Kopia to connect to a Kopia repository on my NAS over NFS and then snapshots the contents of the app data mount into the Kopia repository
  • The snapshots made by Kopia are incremental which makes the Job run very quick.
  • The app data mount is frozen during backup to prevent writes and unfrozen when the snapshot is complete.
  • The PersistentVolumeClaim resources must contain the labels app.kubernetes.io/name, app.kubernetes.io/instance, and snapshot.home.arpa/enabled

Some important notes on the implementation of this method:

  • Kopia has a Web UI which you can deploy into your cluster to have access to the repository via the UI or by executing into the Pod and using the Kopia CLI. This deployment is required if using the Taskfile snapshot:create and snapshot:restore tasks I created.
  • Recovery is done manually by using a different Job which utilizes a task with Taskfile I wrote a task that creates a restore Job that shutdowns the app and restores a snapshot from the Kopia repository into the apps' data PersistentVolumeClaim and then puts the app back into a running state
  • There is another CronJob that syncs the Kopia repository to Backblaze B2 everyday.

🌐 DNS

Ingress Controller

Over WAN, I have port forwarded ports 80 and 443 to the load balancer IP of my ingress controller that's running in my Kubernetes cluster.

Cloudflare works as a proxy to hide my homes WAN IP and also as a firewall. When not on my home network, all the traffic coming into my ingress controller on port 80 and 443 comes from Cloudflare. In Opnsense I block all IPs not originating from the Cloudflares list of IP ranges.

🔸 Cloudflare is also configured to GeoIP block all countries except a few I have whitelisted

Internal DNS

k8s_gateway is deployed on Opnsense. With this setup, k8s_gateway has direct access to my clusters ingress records and serves DNS for them in my internal network. k8s_gateway is only listening on 127.0.0.1 on port 53.

For adblocking, I have AdGuard Home also deployed on Opnsense which has a upstream server pointing the k8s_gateway I mentioned above. Adguard Home listens on my MANAGEMENT, SERVER, IOT and GUEST networks on port 53. In my firewall rules I have NAT port redirection forcing all the networks to use the Adguard Home DNS server.

Without much engineering of DNS @home, these options have made my Opnsense router a single point of failure for DNS. I believe this is ok though because my router should have the most uptime of all my systems.

External DNS

external-dns is deployed in my cluster and configure to sync DNS records to Cloudflare. The only ingresses external-dns looks at to gather DNS records to put in Cloudflare are ones that I explicitly set an annotation of external-dns.home.arpa/enabled: "true"

🔸 Click here to see how else I manage Cloudflare with Terraform.

Dynamic DNS

My home IP can change at any given time and in order to keep my WAN IP address up to date on Cloudflare. I have deployed a CronJob in my cluster, this periodically checks and updates the A record ipv4.domain.tld.


🔧 Hardware

Click to see da rack! rack
Device Count OS Disk Size Data Disk Size Ram Operating System Purpose
Protectli FW6D 1 500GB mSATA - 16GB Opnsense 22 Router
Intel NUC8i3BEK 3 256GB NVMe - 32GB Fedora 36 Kubernetes Masters
Intel NUC8i5BEH 3 240GB SSD 1TB NVMe (rook-ceph) 64GB Fedora 36 Kubernetes Workers
PowerEdge T340 1 2TB SSD 8x12TB ZFS (mirrored vdevs) 64GB Ubuntu 22.04 NFS + Backup Server
Lenovo SA120 1 - 6x12TB (+2 hot spares) - - DAS
Raspberry Pi 1 32GB (SD) - 4GB PiKVM Network KVM
TESmart 8 Port KVM Switch 1 - - - - Network KVM (PiKVM)
APC SMT1500RM2U w/ NIC 1 - - - - UPS
CyberPower PDU41001 2 - - - - PDU

🤝 Graditude and Thanks

Thanks to all the people who donate their time to the Kubernetes @Home community. A lot of inspiration for my cluster comes from the people that have shared their clusters with the k8s-at-home GitHub topic.


📜 Changelog

See commit history


🔏 License

See LICENSE