kairos icon indicating copy to clipboard operation
kairos copied to clipboard

We need a stage that is post install reboot

Open sarg3nt opened this issue 1 year ago • 3 comments

I'm not sure if this is a bug or a feature request so I'm adding it as a bug. Currently there is no stage that guentess that cloud_init code runs after the install and reboot.
This is required for situations where we must not start a service during the install phase. Example: My rke2-agent nodes were not working because the service was starting during install and the nodes were partially registering with the servers then after reboot when the service started again they failed to register because their key had changed and the old one was on the server. These started services were also causing shutdown errors as the service had files open and got stuck trying to exit thus breaking the install.

I tried several of the different stages listed here and all of them started the service during install.

We are net booting these nodes with AuroraBoot serving the ISO.

I don't know if the boot or after-install phases are suppose to happen after install and reboot or not as the description on that page is not clear.

Possible solutions:

  1. If boot or after-install are supposed to happen "after the install and reboot" then they are broken and need to be fixed. The descriptions also need updated
  2. If those two stages are not supposed to happen after install then we need a new stage that is explicitly for this purpose. Example might be after-install-reboot. The descriptions should still be updated to indicate that those stages run during install.

Note I have hacked my services to run correctly by adding echo "ExecStartPre=/bin/sleep 80" >> /lib/systemd/system/rke2-server.service to each service. It's super hacky and I kind of hate but is working for the moment.

This is using a master build as of 11/27/2023

./earthly.sh +iso \
  --FAMILY=rhel \
  --FLAVOR=rockylinux \
  --FLAVOR_RELEASE=9 \
  --BASE_IMAGE=rockylinux:9 \
  --MODEL=generic \
  --VARIANT=core;

Note: This probably fixes 1864

sarg3nt avatar Nov 28 '23 01:11 sarg3nt

This has been discussed on Slack, see: https://spectrocloudcommunity.slack.com/archives/C04B1833NEA/p1700518838192069 @jimmykarily

sarg3nt avatar Nov 28 '23 16:11 sarg3nt

There is actually a stage emitted by our providers during bootstrap (https://github.com/kairos-io/provider-kairos/blob/05d3833dd993604a414d5681ffe94ac9b0abe285/internal/provider/bootstrap.go#L57) - if you are using the standard images (no-core) you can use the kairos-agent.bootstrap stage.

Does that help with your issue @sarg3nt ?

mudler avatar Jan 24 '24 08:01 mudler

@mudler We do not use the standard images. Everything is using core, both our k8s and Vault projects.

sarg3nt avatar Jan 24 '24 21:01 sarg3nt