kairos icon indicating copy to clipboard operation
kairos copied to clipboard

🐛 Take Over installation fails due to "undefined system source to install" on bare metal

Open GrabbenD opened this issue 2 years ago • 9 comments

Issue

With bare metal, steps from https://kairos.io/docs/installation/takeover/ fails after pulling the $IMAGE. This was tested with OpenSuSE MicroOS as the host:

undefined system source to install

Steps

$ whoami
root

$ lsblk /dev/sdb
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sdb    8:16   0 111.8G  0 disk

export DEVICE=/dev/sdb
export IMAGE=quay.io/kairos/core-opensuse-leap:v2.1.0
cat <<'EOF' > config.yaml
#cloud-config
users:
- name: "kairos"
  passwd: "kairos"
  ssh_authorized_keys:
  - github:mudler
EOF
export CONFIG_FILE=config.yaml

$ docker run --privileged -v $PWD:/data -v /dev:/dev -ti $IMAGE kairos-agent manual-install --device $DEVICE /data/$CONFIG_FILE
INFO[2023-06-19T09:42:17Z] Starting elemental version v2.0.1
undefined system source to install

Notes

  • /dev/sdb is a clean GPT disklabel with no partitions

GrabbenD avatar Jun 19 '23 09:06 GrabbenD

Hi @GrabbenD,

can you try with older version of kairos? is this a regression? The last time I've tried it was on the 1.x branch. Can you try with 1.x and letting us know?

mudler avatar Jul 10 '23 08:07 mudler

@mudler

Yupp I'm seeing the same issue with v1.6.0 and I just tried the last version, v2.3.0 For reference, this issue can also be reproduced with kairos-agent interactive-install instead of kairos-agent manual-install

$ podman run --privileged -v $PWD:/data -v /dev:/dev -ti quay.io/kairos/core-opensuse-leap:v1.6.0 kairos-agent interactive-install
<...>
┌───────────────────────────────────────────────────────────────────────────┐
| Interactive installation. Documentation is available at https://kairos.io.|
└──────────────────────────────────────────────────────────── Installation ─┘
 INFO  Available Disks:
 INFO   /dev/sda: unknown (300.00 GiB)
 INFO   /dev/sdb: unknown (1863.02 GiB)
What's the target install device? /dev/sdb
User to setup kairos
Password ●●●●●●
SSH access (rsakey, github/gitlab supported, comma-separated)
Are settings ok? Y
 INFO  Starting installation
 INFO  #cloud-config
       install:
           device: /dev/sdb
       name: Config generated by the installer
       stages:
           network:
               - users:
                   kairos:
                       groups:
                           - admin
                       name: kairos
                       passwd: kairos
                       ssh_authorized_keys:
                           - ""
INFO[2023-07-19T15:06:06Z] Starting elemental version 0.20230222.1+kairos
ERRO[2023-07-19T15:06:06Z] invalid install command setup undefined system source to install
Error: undefined system source to install
exit status 1
$ podman run --privileged -v $PWD:/data -v /dev:/dev -ti quay.io/kairos/core-opensuse-leap:v2.3.0 kairos-agent interactive-install
<...>
┌───────────────────────────────────────────────────────────────────────────┐
| Interactive installation. Documentation is available at https://kairos.io.|
└──────────────────────────────────────────────────────────── Installation ─┘
 INFO  Available Disks:
 INFO   /dev/sda: unknown (300.00 GiB)
 INFO   /dev/sdb: unknown (1863.02 GiB)
What's the target install device? /dev/sdb
User to setup kairos
Password ●●●●●
SSH access (rsakey, github/gitlab supported, comma-separated)
Are settings ok? Y
 INFO  Starting installation
 INFO  #cloud-config
       install:
           device: /dev/sdb
       name: Config generated by the installer
       stages:
           initramfs:
               - users:
                   kairos:
                       groups:
                           - admin
                       name: kairos
                       passwd: kairo
INFO[2023-07-19T15:22:44Z] kairos-agent version v2.1.10
  ERROR   undefined system source to install
undefined system source to install
$ podman run --privileged -v $PWD:/data -v /dev:/dev -ti quay.io/kairos/core-opensuse-leap:v1.6.0 kairos-agent manual-install --device $DEVICE /data/$CONFIG_FILE
INFO[2023-07-19T15:19:42Z] Starting elemental version 0.20230222.1+kairos
ERRO[2023-07-19T15:19:42Z] invalid install command setup undefined system source to install
Error: undefined system source to install
exit status 1
$ podman run --privileged -v $PWD:/data -v /dev:/dev -ti quay.io/kairos/core-opensuse-leap:v2.3.0 kairos-agent manual-install --device $DEVICE /data/$CONFIG_FILE
INFO[2023-07-19T15:19:15Z] kairos-agent version v2.1.10
undefined system source to install

This issue can be observed on bare metal as well as in a Hyper-V VM. I've tested Podman and Docker


For clarification, the host system is Arch Linux (while testing v2.3.0 and v1.6.0 with Podman and Docker today). In the past I've also tried with OpenSuSE MicroOS ^

GrabbenD avatar Jul 19 '23 15:07 GrabbenD

I have this issue as well, trying with a few current (2.4.3) images. I also tried specifying an oci source but was unsure if I was doing so correctly - that got through partitioning, but failed with a zero length image.

dwaite avatar Jan 08 '24 00:01 dwaite

ok, yeah, other medias mount the source of the install into /run/rootfsbase

On docker this is not working, because there is no "boot" process per se, so nothing mounts the media (iso,usb,pxe) into that dir, so the agent thinks that its empty.

There is 2 ways of fixing this.

  • Link the /run/rootfsbase to / in the images, it should be fine once you boot as it gets remounted via dracut
  • Make the source for install configurable via the config file/flag and try first to use that, then falldback to the default iso tree then just fail.

For a workaround the fist option is the easiest and can be applied directly by building your own Dockerfile(example with quay.io/kairos/ubuntu:23.10-core-amd64-generic-v2.5.0):

FROM quay.io/kairos/ubuntu:23.10-core-amd64-generic-v2.5.0
RUN ln -s / /run/rootfsbase

Build that locally docker build -t quay.io/kairos/ubuntu:23.10-core-amd64-generic-v2.5.0-custom . and use that as your $IMAGE on the takeover docs:

export DEVICE=/dev/sda
export IMAGE=quay.io/kairos/ubuntu:23.10-core-amd64-generic-v2.5.0-custom
cat <<'EOF' > config.yaml
#cloud-config
users:
- name: "kairos"
  passwd: "kairos"
  ssh_authorized_keys:
  - github:mudler
EOF
export CONFIG_FILE=config.yaml
docker run --privileged -v $PWD:/data -v /dev:/dev -ti $IMAGE kairos-agent manual-install --device $DEVICE /data/$CONFIG_FILE

I tested this myself and could do a takeover install.

The other one needs fixes directly to the agent so that will take a bit more of time to trickle down.

Itxaka avatar Jan 19 '24 13:01 Itxaka

Can also be fixed by setting --source flag so it may be easier xDDD

Itxaka avatar Jan 19 '24 14:01 Itxaka

with --source flag it fails to calculate the size...

Itxaka avatar Jan 19 '24 14:01 Itxaka

We need to implement a test to make sure we don't break it again in the future.

jimmykarily avatar Jan 29 '24 09:01 jimmykarily

This seems to be a mess to implement, maybe we should do it on the agent by installing from a docker image into a loop device and then checking that loop device instead?

In here would mean running from a vm with an iso with docker, running the agent from the container inside that livecd and then rebooting to check the state.

Where are we getting that livecd able to ssh with same credentials as kairos which has docker already running from?

IMHO, should move to agent directly

Itxaka avatar Jan 29 '24 15:01 Itxaka

I think peg allows us to use different user and password to ssh to the VM. The question is where do we find the non-Kairos ISO. I had a similar problem while trying to implement this: https://github.com/kairos-io/kairos/issues/2182

I tried to find a very small livecd that I could even check in git but they were either too big or not working for other reasons.

If you think installing to a loop device is going to work, let's do that.

jimmykarily avatar Jan 31 '24 07:01 jimmykarily