talos
talos copied to clipboard
SystemDisk and DATA partition
Feature Request
May be this casein be useful only on bare-metal setup.
I have only 1Tb disk on the server. And i want to store some data (cache/database replica/zfs cache/and other cases) on this disk. Sometimes the bad things happen with containerd/kubelet and very easy solution to solve it - just format the EPHEMERAL store. But we can lost the data (case from the slack)
So, proposal:
Add feature to create special (DATA) partition on the system disk.
install:
dataMountPoint: /var/data
ephemeralDiskSize: 64GB
diskSelector:
size: ">128GB"
bootloader: true
systemDiskEncryption:
state:
provider: luks2
keys:
- nodeID: {}
slot: 0
ephemeral:
provider: luks2
keys:
- nodeID: {}
slot: 0
data:
provider: luks2
keys:
- nodeID: {}
slot: 0
I've added 2 new keys
- ephemeralDiskSize - if exist - the installer resize the ephemeral partition to this size, and all other free space allocate the the DATA partition
- dataMountPoint - if exist - format the DATA partition and mount it.
And keep possible to encrypt DATA store too.
Thanks.
UPD.
install:
dataMountPoint: /var/data
osSize: 64GB # RenameMe
diskSelector:
size: ">128GB"
bootloader: true
systemDiskEncryption:
state:
provider: luks2
keys:
- nodeID: {}
slot: 0
ephemeral:
provider: luks2
keys:
- nodeID: {}
slot: 0
data:
provider: luks2
keys:
- nodeID: {}
slot: 0
osSize (or another name) - full size of all system partitions (BOOT+META+STATE+EPHEMERAL) We define the full size of the system which helps to upgrade system if count of partitions change.
Is there any Update on this feature Request? In https://github.com/siderolabs/go-blockdevice/pull/50#issuecomment-929360451, the data partition is defered to v0.14, so is this feature still in development or is ist usable?
there's still no clear decision on whether we want to have DATA partition or not on the system disk
I was surprised to discover that I couldn't just configure this as part of the machine.disks config:
- device: /dev/sda
partitions:
- mountpoint: /
size: 64 GB
- mountpoint: /var/mnt/data
# size: rest of disk
(This technically validates as-is today, but doesn't seem to get anywhere and if anything seems to trash the system)
I noticed a guide for local storage in the talos docs. How is the relation of this guide with this feature request? As far as I understand, the guide just mounts a directory on an existing partition into the kubelet containter while this feature request is about creating a new partition for local storage only. Am I right with this assumption?
This request I believe is to have part of Talos system disk not owned by Talos but given to the workloads.
machine:
install:
ephemeral:
size: 64GiB
data:
size: <use-remaining-space>
mountPoint: /var/data # optional, if not specified, don't format it
systemDiskEncryption:
data:
provider: luks2
keys:
- nodeID: {}
slot: 0
@smira does this mean that this feature has been implemented? And that would be an empty partition, or a pre-formatted drive? Curious if I could point rook-ceph at /dev/sda4 or something like that...
no, feature hasn't been implemented, just adding some planning notes to understand what we're looking for. the design is not final yet, and no commitment on dates yet.
#2213 is about formatting partitions, which perhaps could be a pre-requisite to having part of this FR about the parts of system disk given to workloads.
is this still in design? it'd be pretty useful for edge devices like NUCs
yes, still in design, the best thing is to drop your detailed use-cases to this ticket
Use case are edge devices like intel nucs which could have just one rather big nvme device. Talos should only create a smaller system partition and leave the remaining space for things like longhorn or openebs.
my current usecase isn't commercial, but the edge systems I've worked with usually have single-drive or mirrored-drive configurations. a few quick thoughts:
- additional drives add procurement/sustainment costs and require specific device SKUs
- prevents migration to Talos on existing single-drive hardware
- edge workloads don't often need much storage, so sharing the drive would be fine
- Talos is otherwise well-suited to the edge environments I've worked with - lightweight (vs RKE2 or Tanzu), supported and simple to sustain (vs k3s), security as first-class citizen (especially support for airgapped networks)
Chick-Fil-A also have some decent writeups on their edge k8s NUCs
I think the primary use-case is homelab and edge-sites.
- Homelabs with small NUC's (1x m2-drive, 1x SATA-drive)
- Edge-devices with only one deployed node with several discs
I am in the first category, but at work we would seriously consider the second if it was possible. That would mean that the data-partition should, at least with an option, be unformatted - so that it could be presented to rook-ceph.
- edge workloads don't often need much storage, so sharing the drive would be fine
And also, the added redundancy at the drive-level is very nice (if your PVC supports it).
A very special case for me, is that I have one node running on the free oracle-cloud-instance, which is one huge instance, and where I can't use external storage without going away from the free tier.
Chiming in here with a use case. I've been using Talos in a homelab in a 1 x control plane, 3 x worker setup, and I'm migrating that a single-node NUC/mini PC configuration. Like @davralin, the mini PC (Beelink GTR6) has:
- 1x M.2 NVMe drive (512 GB in my case)
- 1x M.2 SATA drive (2 TB in my case)
Since the SATA drive is limited to a few hundred MBps, I'd like to use 200-300GB on the NVMe for things that benefit from the faster drive, like databases and frequently accessed files, and leave the SATA drive for storage/backup/etc.
I have been using 3 nuc-like devices with MicroOS and k3s with Longhorn.io for storage and plan to use a similar setup for more upcoming small site installs... It's a similar level of immutable with only the longhorn storage needing to be backed up.
Each nuc-like device already has 1TB nvme storage, and some devices simply don't have room for more storage, so it's hard to justify adding more SSDs as a requirement just to use Talos on this setup.
Thought I'd add my thoughts here. I currently run Talos on Raspberry Pis (so not commercial), I install Talos on the SD (64GB) card and have a 1TB NVMe drive attached via USB which I'd like to use for Ephemeral & Data as SDs are notoriously slow and crumble with high I/O workloads.
I've tried mounting the NVMe at /var/ but the Kubelet fails to start. Has anyone had a similar issue? I'd like any data which needs to persist to be store on the faster NMVe drive.
From what I can tell this issue captures part of this desire.
Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm
Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm
Hello, you can dd system image to the disk, and add manually partition at the end (for LVM). It this case you can lose upgrade function. Talos can clean all partition table during upgrade.
Chiming in--also using Intel NUC's and similar commercially available devices on edge. It would be great if Talos would reserve some configurable partition of the system drive for itself, and then leave the rest for applications. Most of our devices have a single large(ish) drive in them.
I'll add in a non storage provider use-case.
I've written the tailscale extension for Talos, and this actually works great, but you do hit interesting edge cases in philosophies. Talos assumes it can wipe it's disks at any time, tailscale maintains a bit of state of private keys to identify devices. While tailscale can be configured to be more ephemeral, it then ends up being dynamic in IPS and names.
Ideally reserving a small (100mb say) partition for system extensions to store things would be great. That way tailscale can activate itself if needed, but also have it's data persist over upgrades
This was not implemented for 2 years, and this would really help deploying Talos on small VPS hosts where storage space is fixed and very likely scarce. A real-life example would be having a 4C8G VPS and only 200GB SSD with no option to add an extra data disk.
If we can have extra data disk attached to the VPS, of course we can do Talos well. But if we don't, we would need to do a little bit of special tweaks. If installing Talos directly on a pre-carved partition wouldn't likely work, then this one I theorized would probably work: create a LVM PV on the physical drives and then carve a system logical volume and a data logical volume, and only install Talos on the system logical volume, leaving the data logical volume available to mount by other applications.
This, however, implies a recent version of GRUB and a kernel so that it can boot off LVM, and hopefully any Talos upgrade won't fuck up the partition scheme, and this will have implication on Ceph usage: Ceph currently won't accept OSD creation on an existing LVM logical volume to prevent LVM deadlock (alas, it is stacking LVM-on-LVM far as I can tell, and it is not smelling good already).
So, I guess you probably can't run Rook on it unless you cheat a little with the following workaround if you want to go in further: create two raw GPT partitions (example: on a 200GB disk), a system partition (example: 75G LVM) and a data partition (example: 125G RAW), and create a LVM PV/VG/LV on the system partition only, while leaving the data partition to be detected by Ceph and manage its own LVM over there.
This scheme however is obviously inflexible, as this means you would have a fixed system partition and needs careful planning beforehand, and in order to scale it, a full dupli-migration of the PV is needed, but it should theoretically work out nicely with Ceph.
-- Oh I just realized Talos also manages the bootloader and EFI partition. This is getting a little complicated I think, as you also have to preserve a small grub section to chainload onto the Talos bootloader inside LVM.
There're some changes in planning for Talos 1.7, hold on :)
Are those changes already visible in the alpha release?
Not yet
Not yet
Maybe give a brief description of what would be done?
@smira (& @utkuozdemir & @frezbo -- we briefly chatted during the community meeting) first of all, thank you for all your work.
@stevefan1999-personal, from the community meeting, I got the impression that #4041 (improved user control over EPHEMERAL partition), which relates, but is different, from #8016 (structured /var) is still in the very early stages. (I'm not affiliated with Sidero Labs/Talos in any capacity.)
Also wanted to summarize the mega-threads regarding these interrelated issues and say that 80% of what people seem to need is that Talos doesn't take over the entirety of a disk. (I'm biased in that I also would like that feature.)
I'd like to suggest a maxEphemeralPartitionSize parameter that restricts the size of that partition and leaves the rest of main disk unmanaged.
(I operate a non-profit/community cluster with mostly donated/reused/recycled hardware, which are extremely heterogenous. I use Rook-Ceph, and consider we're all about using computational resources thriftily, it hurts that there's a lot of storage in EPHEMERALs that I can't take advantage of.)
We will publish the design document to an issue and link it once it's ready to the #8010 issue