kairos icon indicating copy to clipboard operation
kairos copied to clipboard

🌱 Be able to specify multiple installation target disk

Open mudler opened this issue 3 years ago • 10 comments
trafficstars

Is your feature request related to a problem? Please describe. A system might have a different mappings of device names, depending on the HW

Describe the solution you'd like A way to give the installer a list of devices, try that out and find the one available and install it from there. For instance:

install:
  device_list:
  - /dev/sda
  - /dev/vda

First match wins, and the first found becomes the install target

Describe alternatives you've considered

Additional context

mudler avatar Nov 07 '22 11:11 mudler

What is the status on this being picked up?

3pings avatar Jan 10 '23 19:01 3pings

I find this feature a bit strange. Apparently, the idea is to use the same config on different hardware, otherwise we could simply use the correct device name directly.

Given there is the "auto" option that selects the largest disk, I assume this feature is needed when "auto" won't do the right thing. For example, if a smaller disk should be used.

And this is what I find strange. We will be using the config on more than one machines with no predicable device names (first point above) but yet, we will know of a specific list of device names that will guarantee that the largest disk won't be selected (point 2 above).

Maybe this is true in some rare cases but I don't see this being a generally useful feature. Am I missing something?

jimmykarily avatar Jan 11 '23 07:01 jimmykarily

It is common for fleet devices to be identical and ordered in bulk from manufacturers. It is stated the largest disk is used which I find interesting as typically the OS level disk is smaller and the larger disk is used for storage. Additionally, users may deploy 2 different types of nodes 1 for CP and one for workers that have different disks (CP may have an OS disk of /dev/nvme4n1, whereas workers might be /dev/nvme1n1) These would be all identical across thousands of nodes. I want to create an image for my devices, not manage a bunch of images. The use case for this type of list is pretty straightforward. I want to use a common image across different node types and specify the device I want to install. I do not want to leave it up to "auto" which is unpredictable.

3pings avatar Feb 01 '23 14:02 3pings

We discussed this in the sprint planning and we think the use case if perfectly valid but the solution should be more generic. E.g. what happens if the user wants to select the smallest disk always and there are machines that have a smaller vda and some others a smaller vdb? Or any other logic? One idea was that we could allow the user to implement at "hook" which returns the device on which kairos should install. Pretty much like how kcrypt calls out to kcrypt-challenger (or kairos calls kcrypt).

What about other cases with more complex disk schemas. E.g. would we ever want to allow the user to create some partitions on one disk and some others on another one? We should better think about it now before we implement a solution.

@mudler thoughts?

jimmykarily avatar Feb 06 '23 17:02 jimmykarily

We discussed this in the sprint planning and we think the use case if perfectly valid but the solution should be more generic. E.g. what happens if the user wants to select the smallest disk always and there are machines that have a smaller vda and some others a smaller vdb? Or any other logic? One idea was that we could allow the user to implement at "hook" which returns the device on which kairos should install. Pretty much like how kcrypt calls out to kcrypt-challenger (or kairos calls kcrypt).

I think in that case is perfectly valid to specify the expected device for installation. Especially if we are talking about bulk hosts - the HW layout should be the same across machines.

What about other cases with more complex disk schemas. E.g. would we ever want to allow the user to create some partitions on one disk and some others on another one? We should better think about it now before we implement a solution.

@mudler thoughts?

Maybe we can just support a regex matching the device name. For instance - if you expect NVMe's , it's safe to assume the device is /dev/nvm*, and so on so forth. Bashing out for user options might be tricky, especially to validate, and then it wouldn't be clear what can be called or not in order to identify partitions (be too much generic can be an overkill here)

mudler avatar Feb 13 '23 08:02 mudler

The installer can already skip partitioning altogether. It's also possible to run arbitrary commands using cloud-init. This allows someone to do custom partitioning (and labeling). If that works, we simply have to document how this works. Let's do a spike on this and write down docs. Then we decide if we need something more.

jimmykarily avatar Feb 13 '23 08:02 jimmykarily

Here is my test config.yaml for reference:

#cloud-config

users:
- name: "kairos"
  passwd: "kairos"

# Tell elemental to not create partitions
options:
  no-format: "true"

# User has to copy this file inside /oem
# because elemental will try to find stage steps there. Won't repect it here.
# Then the user has to install with
# `kairos-agent manual-install` with this file.
# Netbooting doesn't work for the same reason (elemental ignores the stages).
stages:
  before-install:
  - name: "Create a file"
    commands:
      - |
        touch /tmp/me-now
  boot:
  # TODO: check which device exists
  - if:  '[ -e /dev/vda ] && (kairos-agent state get boot | grep -q unknown)'
    name: "Create partitions"
    commands:
      - |
        touch /tmp/me-now

    layout:
      device:
        path: /dev/vda
      add_partitions:
        - fsLabel: COS_STATE
          size: 16240 # At least 16gb
          pLabel: state

As described in the comments, the above plan requires jumping through some hoops to make it work, because elemental doesn't respect the stages passed to the kairos-agent. We are thinking solutions to make this simpler.

jimmykarily avatar Mar 03 '23 16:03 jimmykarily

Use cases:

  • Bigger COS_STATE (because of bigger images). e.g. https://github.com/kairos-io/kairos/issues/1025
  • Custom partitions and custom sizes. e.g. https://github.com/kairos-io/kairos/issues/558, https://github.com/kairos-io/kairos/issues/209
  • Complex device name logic. e.g. https://github.com/kairos-io/kairos/issues/391
  • LVM e.g. what we tried to do in the CI cluster (we would also need to add lvm support, packages etc)
  • Pick device by id: https://github.com/kairos-io/kairos/issues/1879

Possible solutions:

  • Yaml based DSL to describe all possible configurations
  • Support an external script and skip partitioning altogether (maybe as a cloud-init script)
  • Different solution per use case above. E.g. implement an easy way define device names with priorities to solve 391 but it doesn't solve any of the other issues. Find solutions for the others.

Although we can implement simpler solutions for some of the issues, the one feature that solves them all is the fully custom partitioning. Let's do that first and decide if we need simpler solutions for some of the rest.

We'll implement custom partitioning on https://github.com/kairos-io/kairos/issues/209 and keep this open (and blocked) to decide if we prefer a better solution for this use case only.

jimmykarily avatar Mar 06 '23 13:03 jimmykarily

After a sync call we agreed:

We were blocked because we were trying to netboot, and the config_url field took no effect on stages defined in that file. Elemental *-install stages needs to be executed ALSO in the file that is provided by the user during the installation (currently only files in /oem, /usr/local/cloud-config are respected).

Solution: #209 In order to read stages that kicks in before the installer like boot. config_url needs to be downloaded and saved into /oem (or in the path scanned by elemental) (see https://github.com/kairos-io/kairos/blob/f8999c198dde9d9b31d794397aa4723a7632448c/overlay/files/system/oem/00_datasource.yaml#L3) - download it, save it, say to /oem/ [LiveCD and netboot] - chainload with forloop https://github.com/kairos-io/kairos/blob/f8999c198dde9d9b31d794397aa4723a7632448c/pkg/config/config.go#L193 (?)

Trying locally: First try to put the config we had and try with a datasource iso. If that works we have to make it available into /oem as the datasource does.

Let's keep this issue as a tracker, until we fix all the pieces to get there.

mudler avatar Mar 06 '23 15:03 mudler

For reference, skipping formatting of the disk was broken for a while: https://github.com/kairos-io/kairos/issues/2281

Added a test on ensure we don't break it again: https://github.com/kairos-io/kairos/pull/2291

jimmykarily avatar Apr 11 '24 16:04 jimmykarily