talos icon indicating copy to clipboard operation
talos copied to clipboard

SystemDisk and DATA partition

Open sergelogvinov opened this issue 4 years ago • 33 comments
trafficstars

Feature Request

May be this casein be useful only on bare-metal setup.

I have only 1Tb disk on the server. And i want to store some data (cache/database replica/zfs cache/and other cases) on this disk. Sometimes the bad things happen with containerd/kubelet and very easy solution to solve it - just format the EPHEMERAL store. But we can lost the data (case from the slack)

So, proposal:

Add feature to create special (DATA) partition on the system disk.

  install:
    dataMountPoint: /var/data
    ephemeralDiskSize: 64GB

    diskSelector:
      size: ">128GB"
    bootloader: true
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0

I've added 2 new keys

  • ephemeralDiskSize - if exist - the installer resize the ephemeral partition to this size, and all other free space allocate the the DATA partition
  • dataMountPoint - if exist - format the DATA partition and mount it.

And keep possible to encrypt DATA store too.

Thanks.

sergelogvinov avatar Aug 09 '21 07:08 sergelogvinov

UPD.


  install:
    dataMountPoint: /var/data
    osSize: 64GB # RenameMe

    diskSelector:
      size: ">128GB"
    bootloader: true
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0

osSize (or another name) - full size of all system partitions (BOOT+META+STATE+EPHEMERAL) We define the full size of the system which helps to upgrade system if count of partitions change.

sergelogvinov avatar Aug 10 '21 15:08 sergelogvinov

Is there any Update on this feature Request? In https://github.com/siderolabs/go-blockdevice/pull/50#issuecomment-929360451, the data partition is defered to v0.14, so is this feature still in development or is ist usable?

hobyte avatar May 08 '22 21:05 hobyte

there's still no clear decision on whether we want to have DATA partition or not on the system disk

smira avatar May 10 '22 17:05 smira

I was surprised to discover that I couldn't just configure this as part of the machine.disks config:

    - device: /dev/sda
      partitions:
      - mountpoint: /
        size: 64 GB
      - mountpoint: /var/mnt/data
        # size: rest of disk

(This technically validates as-is today, but doesn't seem to get anywhere and if anything seems to trash the system)

nickbp avatar Aug 30 '22 07:08 nickbp

I noticed a guide for local storage in the talos docs. How is the relation of this guide with this feature request? As far as I understand, the guide just mounts a directory on an existing partition into the kubelet containter while this feature request is about creating a new partition for local storage only. Am I right with this assumption?

hobyte avatar Aug 30 '22 20:08 hobyte

This request I believe is to have part of Talos system disk not owned by Talos but given to the workloads.

smira avatar Sep 02 '22 19:09 smira

machine:
  install:
     ephemeral:
        size: 64GiB
     data:
        size: <use-remaining-space>
        mountPoint: /var/data # optional, if not specified, don't format it
  systemDiskEncryption:
    data:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0        

smira avatar Sep 19 '22 17:09 smira

@smira does this mean that this feature has been implemented? And that would be an empty partition, or a pre-formatted drive? Curious if I could point rook-ceph at /dev/sda4 or something like that...

davralin avatar Sep 20 '22 16:09 davralin

no, feature hasn't been implemented, just adding some planning notes to understand what we're looking for. the design is not final yet, and no commitment on dates yet.

smira avatar Sep 21 '22 10:09 smira

#2213 is about formatting partitions, which perhaps could be a pre-requisite to having part of this FR about the parts of system disk given to workloads.

vorburger avatar Nov 19 '22 17:11 vorburger

is this still in design? it'd be pretty useful for edge devices like NUCs

pl4nty avatar Sep 04 '23 23:09 pl4nty

yes, still in design, the best thing is to drop your detailed use-cases to this ticket

smira avatar Sep 05 '23 11:09 smira

Use case are edge devices like intel nucs which could have just one rather big nvme device. Talos should only create a smaller system partition and leave the remaining space for things like longhorn or openebs.

runningman84 avatar Sep 09 '23 19:09 runningman84

my current usecase isn't commercial, but the edge systems I've worked with usually have single-drive or mirrored-drive configurations. a few quick thoughts:

  • additional drives add procurement/sustainment costs and require specific device SKUs
  • prevents migration to Talos on existing single-drive hardware
  • edge workloads don't often need much storage, so sharing the drive would be fine
  • Talos is otherwise well-suited to the edge environments I've worked with - lightweight (vs RKE2 or Tanzu), supported and simple to sustain (vs k3s), security as first-class citizen (especially support for airgapped networks)

Chick-Fil-A also have some decent writeups on their edge k8s NUCs

pl4nty avatar Sep 10 '23 03:09 pl4nty

I think the primary use-case is homelab and edge-sites.

  • Homelabs with small NUC's (1x m2-drive, 1x SATA-drive)
  • Edge-devices with only one deployed node with several discs

I am in the first category, but at work we would seriously consider the second if it was possible. That would mean that the data-partition should, at least with an option, be unformatted - so that it could be presented to rook-ceph.

  • edge workloads don't often need much storage, so sharing the drive would be fine

And also, the added redundancy at the drive-level is very nice (if your PVC supports it).

A very special case for me, is that I have one node running on the free oracle-cloud-instance, which is one huge instance, and where I can't use external storage without going away from the free tier.

davralin avatar Sep 20 '23 13:09 davralin

Chiming in here with a use case. I've been using Talos in a homelab in a 1 x control plane, 3 x worker setup, and I'm migrating that a single-node NUC/mini PC configuration. Like @davralin, the mini PC (Beelink GTR6) has:

  • 1x M.2 NVMe drive (512 GB in my case)
  • 1x M.2 SATA drive (2 TB in my case)

Since the SATA drive is limited to a few hundred MBps, I'd like to use 200-300GB on the NVMe for things that benefit from the faster drive, like databases and frequently accessed files, and leave the SATA drive for storage/backup/etc.

bcspragu avatar Oct 02 '23 21:10 bcspragu

I have been using 3 nuc-like devices with MicroOS and k3s with Longhorn.io for storage and plan to use a similar setup for more upcoming small site installs... It's a similar level of immutable with only the longhorn storage needing to be backed up.

Each nuc-like device already has 1TB nvme storage, and some devices simply don't have room for more storage, so it's hard to justify adding more SSDs as a requirement just to use Talos on this setup.

jamcole avatar Oct 11 '23 17:10 jamcole

Thought I'd add my thoughts here. I currently run Talos on Raspberry Pis (so not commercial), I install Talos on the SD (64GB) card and have a 1TB NVMe drive attached via USB which I'd like to use for Ephemeral & Data as SDs are notoriously slow and crumble with high I/O workloads.

I've tried mounting the NVMe at /var/ but the Kubelet fails to start. Has anyone had a similar issue? I'd like any data which needs to persist to be store on the faster NMVe drive.

From what I can tell this issue captures part of this desire.

jamesagarside avatar Oct 13 '23 15:10 jamesagarside

Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm

Ulrar avatar Oct 29 '23 18:10 Ulrar

Trying to setup backups for piraeus-operator / linstor and discovered that they only support uploading snapshots to s3 or other on LVM / ZFS backed volumes, so I guess the FILE_THIN backed ones we can create on talos can't be backed up properly. Would be awesome if talos could leave a partition to serve for lvm

Hello, you can dd system image to the disk, and add manually partition at the end (for LVM). It this case you can lose upgrade function. Talos can clean all partition table during upgrade.

sergelogvinov avatar Oct 30 '23 07:10 sergelogvinov

Chiming in--also using Intel NUC's and similar commercially available devices on edge. It would be great if Talos would reserve some configurable partition of the system drive for itself, and then leave the rest for applications. Most of our devices have a single large(ish) drive in them.

Kuresov avatar Jan 01 '24 20:01 Kuresov

I'll add in a non storage provider use-case.

I've written the tailscale extension for Talos, and this actually works great, but you do hit interesting edge cases in philosophies. Talos assumes it can wipe it's disks at any time, tailscale maintains a bit of state of private keys to identify devices. While tailscale can be configured to be more ephemeral, it then ends up being dynamic in IPS and names.

Ideally reserving a small (100mb say) partition for system extensions to store things would be great. That way tailscale can activate itself if needed, but also have it's data persist over upgrades

btrepp avatar Jan 05 '24 01:01 btrepp

This was not implemented for 2 years, and this would really help deploying Talos on small VPS hosts where storage space is fixed and very likely scarce. A real-life example would be having a 4C8G VPS and only 200GB SSD with no option to add an extra data disk.

If we can have extra data disk attached to the VPS, of course we can do Talos well. But if we don't, we would need to do a little bit of special tweaks. If installing Talos directly on a pre-carved partition wouldn't likely work, then this one I theorized would probably work: create a LVM PV on the physical drives and then carve a system logical volume and a data logical volume, and only install Talos on the system logical volume, leaving the data logical volume available to mount by other applications.

This, however, implies a recent version of GRUB and a kernel so that it can boot off LVM, and hopefully any Talos upgrade won't fuck up the partition scheme, and this will have implication on Ceph usage: Ceph currently won't accept OSD creation on an existing LVM logical volume to prevent LVM deadlock (alas, it is stacking LVM-on-LVM far as I can tell, and it is not smelling good already).

So, I guess you probably can't run Rook on it unless you cheat a little with the following workaround if you want to go in further: create two raw GPT partitions (example: on a 200GB disk), a system partition (example: 75G LVM) and a data partition (example: 125G RAW), and create a LVM PV/VG/LV on the system partition only, while leaving the data partition to be detected by Ceph and manage its own LVM over there.

This scheme however is obviously inflexible, as this means you would have a fixed system partition and needs careful planning beforehand, and in order to scale it, a full dupli-migration of the PV is needed, but it should theoretically work out nicely with Ceph.

-- Oh I just realized Talos also manages the bootloader and EFI partition. This is getting a little complicated I think, as you also have to preserve a small grub section to chainload onto the Talos bootloader inside LVM.

stevefan1999-personal avatar Jan 21 '24 16:01 stevefan1999-personal

There're some changes in planning for Talos 1.7, hold on :)

smira avatar Jan 22 '24 10:01 smira

Are those changes already visible in the alpha release?

alexvanderberkel avatar Feb 14 '24 07:02 alexvanderberkel

Not yet

smira avatar Feb 14 '24 10:02 smira

Not yet

Maybe give a brief description of what would be done?

stevefan1999-personal avatar Feb 14 '24 20:02 stevefan1999-personal

@smira (& @utkuozdemir & @frezbo -- we briefly chatted during the community meeting) first of all, thank you for all your work.

@stevefan1999-personal, from the community meeting, I got the impression that #4041 (improved user control over EPHEMERAL partition), which relates, but is different, from #8016 (structured /var) is still in the very early stages. (I'm not affiliated with Sidero Labs/Talos in any capacity.)

ianatha avatar Feb 14 '24 20:02 ianatha

Also wanted to summarize the mega-threads regarding these interrelated issues and say that 80% of what people seem to need is that Talos doesn't take over the entirety of a disk. (I'm biased in that I also would like that feature.)

I'd like to suggest a maxEphemeralPartitionSize parameter that restricts the size of that partition and leaves the rest of main disk unmanaged.

(I operate a non-profit/community cluster with mostly donated/reused/recycled hardware, which are extremely heterogenous. I use Rook-Ceph, and consider we're all about using computational resources thriftily, it hurts that there's a lot of storage in EPHEMERALs that I can't take advantage of.)

ianatha avatar Feb 14 '24 20:02 ianatha

We will publish the design document to an issue and link it once it's ready to the #8010 issue

smira avatar Feb 15 '24 10:02 smira