kairos
kairos copied to clipboard
`install.auto: false` is not respected when booting with Auroraboot
Moved to a new ticket from this one: https://github.com/kairos-io/kairos/issues/2281#issuecomment-2078014965
We need to find out if auto: false
is respected in general and when booting with auroraboot. If yes then let's find out what's wrong in the original report.
Hello,
I am a bot, an experiment of @mudler and @jimmykarily. Your issue titled "install.auto: false
is not respected when booting with Auroraboot"` (#2516) has been noted. Please provide more information and follow the project's guidelines by adding the missing details.
To better assist you, please ensure that the issue includes:
- A clear description of the issue.
- Steps to reproduce (if it's a bug).
- The versions of the relevant artifacts being used.
Once these requirements are met, the issue can be properly triaged and addressed. Thank you for your understanding and cooperation.
This may also solve: https://github.com/kairos-io/kairos/issues/2030
An update from my side of things.
While troubleshooting another issue I realized that maybe it had to do with the file being supplied to the node directly from vSphere customization not having auto: false
set. so I added it but it didn't change anything.
As a reminder. we have a cloud_init.yaml
file being served from AuroraBoot and I can't find any evidence that any of the config, regardless of stage, is being ran from that file, however, we also have a "per node" config being added from the vSphere guestinfo.userdata
in Terraform, and it looks like everything, regardless of stage is running from there. Specifically adding auto: false
to that file as well as the the one from AuroraBoot did not keep that from happening.
Terraform for adding the custom config to a vSphere VM
extra_config = {
"guestinfo.userdata" = data.template_cloudinit_config.agent[count.index].rendered
"guestinfo.userdata.encoding" = "gzip+base64"
}
initramfs_stage.log
[root@lpul-vault-k8s-server-0 immucore]# cat initramfs_stage.log
2024-05-02T19:48:59Z INF Running stage: initramfs.before
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name: Pull data from provider
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Blacklist bpfilter on Alpine ( bug: https://github.com/kairos-io/kairos/issues/277 )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run ! [[ -f /etc/hosts ]] || ! [[ $(grep '127.0.0.1' /etc/hosts) ]]
: exit status 1)' stage name: Make sure hosts file is present and includes a record for 127.0.0.1
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f /oem/userdata ]: exit status 1)' stage name:
2024-05-02T19:48:59Z INF Done executing stage 'initramfs.before'
2024-05-02T19:48:59Z INF Running stage: initramfs
2024-05-02T19:48:59Z INF Processing stage step 'Enable systemd-network config files for DHCP'. ( commands: 1, files: 2, ... )
2024-05-02T19:48:59Z INF Processing stage step ''. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Create OpenRC services
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-recovery and generate a temporary pass
2024-05-02T19:48:59Z INF Processing stage step 'systemd-sysext initramfs settings'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z INF Processing stage step 'Create journalctl /var/log/journal dir'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z ERR Failed to connect system bus: No such file or directory
: failed to run networkctl reload: exit status 1
2024-05-02T19:48:59Z ERR 1 error occurred:
* failed to run networkctl reload: exit status 1
2024-05-02T19:48:59Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-agent.service → /etc/systemd/system/kairos-agent.service.
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sbin/openrc" ]
: exit status 1)' stage name: Enable OpenRC services
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ ! -f "/run/cos/live_mode" ]: exit status 1)' stage name:
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -s /usr/local/etc/machine-id ]: exit status 1)' stage name: Restore /etc/machine-id for systemd systems
2024-05-02T19:48:59Z INF Processing stage step 'Disable NetworkManager and wicked'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.remote_recovery_mode" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-recovery for openRC based systems
2024-05-02T19:48:59Z INF Processing stage step ''. ( commands: 0, files: 2, ... )
2024-05-02T19:48:59Z ERR 2 errors occurred:
* failed to run systemctl disable NetworkManager: exit status 1
* failed to run systemctl disable wicked: exit status 1
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ ! -f "/run/cos/recovery_mode" ] && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Restore /etc/machine-id for openrc systems
2024-05-02T19:48:59Z INF Processing stage step 'Enable systemd-network and systemd-resolved'. ( commands: 0, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "kairos.reset" /proc/cmdline || [ -f /run/cos/autoreset_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name: Starts kairos-reset for systemd based systems
2024-05-02T19:48:59Z INF Processing stage step 'Default systemd config'. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -qv "interactive-install" /proc/cmdline || grep -qv "install-mode-interactive" /proc/cmdline) && \
[ -f /run/cos/live_mode ] && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name: Autologin on livecd for OpenRC
2024-05-02T19:48:59Z INF Command output: Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
2024-05-02T19:48:59Z ERR 5 errors occurred:
* failed to run systemctl enable systemd-timesyncd: exit status 1
* failed to run systemctl enable nohang: exit status 1
* failed to run systemctl enable nohang-desktop: exit status 1
* failed to run systemctl enable fail2ban: exit status 1
* failed to run systemctl enable logrotate.timer: exit status 1
2024-05-02T19:48:59Z INF Processing stage step 'Generate host keys'. ( commands: 1, files: 0, ... )
2024-05-02T19:48:59Z INF Processing stage step 'Link /etc/resolv.conf to systemd resolv.conf'. ( commands: 2, files: 0, ... )
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run grep -q "kairos.reset" /proc/cmdline && [ -f "/sbin/openrc" ]: exit status 1)' stage name: Starts kairos-reset for openRC-based systems
2024-05-02T19:48:59Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run cat /proc/cmdline | grep "selinux=1"
: exit status 1)' stage name: Relabelling
2024-05-02T19:48:59Z INF Command output:
2024-05-02T19:48:59Z INF Command output:
2024-05-02T19:49:00Z INF Command output: ssh-keygen: generating new host keys: RSA DSA ECDSA ED25519
2024-05-02T19:49:00Z INF Processing stage step 'Create systemd services'. ( commands: 0, files: 5, ... )
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 5, files: 0, ... )
2024-05-02T19:49:00Z INF Command output: Removed "/etc/systemd/system/getty.target.wants/[email protected]".
2024-05-02T19:49:00Z INF Command output: Running in chroot, ignoring command 'stop'
2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/[email protected] → /dev/null.
2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos.service → /etc/systemd/system/kairos.service.
2024-05-02T19:49:00Z INF Command output: Created symlink /etc/systemd/system/multi-user.target.wants/kairos-webui.service → /etc/systemd/system/kairos-webui.service.
2024-05-02T19:49:00Z INF Processing stage step 'Enable systemd services'. ( commands: 4, files: 0, ... )
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "install-mode" /proc/cmdline || grep -q "nodepair.enable" /proc/cmdline ) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Processing stage step 'Setup groups'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Processing stage step 'Setup users'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run (grep -q "interactive-install" /proc/cmdline || grep -q "install-mode-interactive" /proc/cmdline) && \
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
[ -f "/sbin/openrc" ]
: exit status 1)' stage name:
2024-05-02T19:49:00Z INF Processing stage step 'Set user password if running in live or uki'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Setup sudo'. ( commands: 1, files: 1, ... )
2024-05-02T19:49:00Z INF Command output: Locking password for user root.
passwd: Success
2024-05-02T19:49:00Z INF Processing stage step 'Ensure runtime permission'. ( commands: 2, files: 0, ... )
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Command output:
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e "/usr/local/cloud-config" ]: exit status 1)' stage name: Ensure runtime permission
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -f "/sys/firmware/devicetree/base/model" ] && grep -i jetson "/sys/firmware/devicetree/base/model"
: exit status 1)' stage name: Create files
2024-05-02T19:49:00Z INF Processing stage step ''. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Set hostname'. ( commands: 0, files: 0, ... )
2024-05-02T19:49:00Z INF Processing stage step 'Run commands'. ( commands: 1, files: 0, ... )
2024-05-02T19:49:00Z INF Command output: 2024-05-02 19:49:00 Add DHCP ClientIdentifier=mac to network config if not already present.
2024-05-02 19:49:00 Adding line [DHCP] to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:49:00 Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp.network
2024-05-02 19:49:00 Adding line [DHCP] to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:49:00 Adding line ClientIdentifier=mac to file /etc/systemd/network/20-dhcp-legacy.network
2024-05-02 19:49:00 Add ll to the root and Kairos .bashrc if not already present.
2024-05-02 19:49:00 Adding line alias ll="ls -alh" to file /root/.bashrc
2024-05-02 19:49:00 Creating new file /home/kairos/.bashrc with line alias ll="ls -alh"
2024-05-02 19:49:00 Creating new file /home/kairos/.profile with line alias ll="ls -alh"
2024-05-02 19:49:00 Add rke2 bin to the path.
2024-05-02 19:49:00 Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /root/.bashrc
2024-05-02 19:49:00 Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.bashrc
2024-05-02 19:49:00 Adding line export PATH="${PATH}:/var/lib/rancher/rke2/bin/" to file /home/kairos/.profile
/bin/sh: line 1: [[: 0[0]: syntax error: invalid arithmetic operator (error token is "[0]")
2024-05-02T19:49:00Z INF Done executing stage 'initramfs'
2024-05-02T19:49:00Z INF Running stage: initramfs.after
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [ -e /sbin/openrc ]: exit status 1)' stage name: Enable serial login for alpine
2024-05-02T19:49:00Z WRN (conditional) Skip 'Skipping stage (if statement error: failed to run [[ $(kairos-agent state get kairos.flavor) =~ ^ubuntu ]]: exit status 1)' stage name: setupcon initramfs.after ubuntu
2024-05-02T19:49:00Z INF Done executing stage 'initramfs.after'
2024-05-02T19:49:00Z INF Running stage: initramfs.before
2024-05-02T19:49:00Z INF Done executing stage 'initramfs.before'
2024-05-02T19:49:00Z INF Running stage: initramfs
2024-05-02T19:49:00Z INF Done executing stage 'initramfs'
2024-05-02T19:49:00Z INF Running stage: initramfs.after
2024-05-02T19:49:00Z INF Done executing stage 'initramfs.after'
With install.auto: true
the installation starts automatically (netbooted a VM with virt-manager):
~/workspace/kairos/kairos (master)*$ cat config.yaml
#cloud-config
users:
- name: kairos
passwd: kairos
install:
auto: true
debug: true
~/workspace/kairos/kairos (master)*$ docker run --rm -ti -v /tmp/build -v /var/run/docker.sock:/var/run/docker.sock -v "$PWD"/config.yaml:/config.yaml --net host quay.io/kairos/auroraboot --set "container_image=docker://quay.io/kairos/debian:bookworm-slim-core-amd64-generic-v3.0.4-73-g8ddb9092-dirty" --cloud-config /config.yaml
With install.auto: false
the installation doesn't start.
It seems to work correctly. I'm not sure why I said the installation in your case indeed started
. I suspect I was confused, the installation was triggered with kairos-agent manual-install
. I don't see any other logs that indicate it started though it shouldn't.
Regarding config options and such, @sarg3nt if you think configuration doesn't get merged properly, you should set debug: true
in the config and run the installation with kairos-agent manual-install
saving the logs (like you did in the original issue). The applicable config is then printed in the logs so you can tell which options made it in the final one and which not.
@jimmykarily I think we might need some clarification here: What is happening during a manual install is that the config being sent by AuroraBoot is NOT auto running (as it should) but the config being sent by Sphere via custom data IS running when I think it should not be, due to it being a manual install.
Indeed there is some confusion (either on my side or yours :) ) . Let me try to clarify.
The configs are not install recipes that are run (or not) as a whole. The yaml keys in each and every config, are merged with those from all other configs before the kairos-agent starts the installation. There is a component, the config collector, which collects configs from various locations:
- specific directories
- remote config defined in cmdline. (e.g. in the case of netboot:
config_url=http://192.168.122.1:8090/_/file?name=other-1
) - remote configs defined in config files
All these configs are getting merged into one config which is used to install Kairos. In your case, the config from Auroraboot and the config from Sphere will both be merged, potentially overwritting each other's keys if they both specify the same keys.
To demonstrate the above, I started Auroraboot with this config:
#cloud-config
users:
- name: kairos
passwd: kairos
install:
auto: false
stages:
dimitris-stage:
- name: "Dimitris stage"
commands:
- echo "dimitris"
debug: true
and I netbooted a VM. From withing the VM (the installation didn't automatically start, because install.auto
is false
), I created this config file:
root@localhost:/home/kairos# cat c.yaml
#cloud-config
stages:
local-config-stage:
- name: "Local config stage"
commands:
- echo "from the local config"
When I run the installation with this command:
kairos-agent --debug manual-install c.yaml 2>&1 | tee out.log
the output log, prints the final config in which you can find these lines:
Config: collector.Config{
"config_url": "http://192.168.122.1:8090/_/file?name=other-1",
"debug": true,
"install": collector.Config{
"auto": false,
"poweroff": false,
"reboot": false,
},
"stages": collector.Config{
"dimitris-stage": []interface {}{
collector.Config{
"commands": []interface {}{
"echo \"dimitris\"",
},
"name": "Dimitris stage",
},
},
"local-config-stage": []interface {}{
collector.Config{
"commands": []interface {}{
"echo \"from the local config\"",
},
"name": "Local config stage",
},
},
},
"users": []interface {}{
collector.Config{
"name": "kairos",
"passwd": "kairos",
},
},
},
See how both dimitris-stage
(from auroraboot) and local-config-stage
(from the local config file) are in the final config? If they were defining the same keys (e.g. debug
), the final one that was merged would define the value of that key (don't rely on this, there is no guaranteed order!).
So to summarize, if you have multiple sources from which configs are supplied, expect them all to be merged before the installation starts. Setting install.auto: false
in one file and install.auto: true
in another file will result in only one value to be in the final config (no guaranteed order). The install.auto
key doesn't refer to the file itself but it's an instruction to the kairos-agent on whether to start the installation automatically or not. This is because the agent runs in the background as a service, so it can even start the installation automatically.
I hope this helps. Let me know if I'm not understanding the issue and explaining the wrong things.
@jimmykarily
Thanks for the explanation. I get that the config collector collects those configs and runs them during install.
What I'm saying is that something else is running the config coming from vSphere (and only this config) during startup even if install is set to false
I can replicate this, it happens every time and is very obvious systems.
Does that make sense?
Here's the general workflow
- Set
install.auto: false
,install.reboot: false
in the config coming from AuroraBoot - The
cloud_init
being injected via vSphere userdata has aninstall
section but noauto
. My understanding is that this should not be needed but I don't know that I tried to set it tofalse
either. Only thing in this install section isbind_mounts
andgrub_options
to turn on debug. - Deploy a VM, AuroraBoot serves the file and the VM comes up halting at the Kairos shell.
- SSH to the machine.
- Check the VM, it has been configured with values from the
cloud_init.yaml
injected from vSphere. I have not ran the install. No install should have happened.
Examples of things that have been configured that are absolutely not built into the base OS.
-
users
The password has been set and the ssh key applied -
write_files
All files in this section have been created -
stages.initramfs
hostname has been set, a script incommands
has been run. -
boot.systemctl
service timers have been enabled and started.
NOTE: We are using system timers because otherwise the services would be started here break the install. The timers are here to make the services not start for some time so they don't start during install. This is hacky as heck and causes node creation to take much longer than it should but it's what we currently have to do.
Hey @jimmykarily any thoughts on the above? Thanks!
In the kairos config, one can specify more than just installation options. For example users:
and stages.initramfs
. These might have an effect, even if installation doesn't run, simply because they are parsed by immucore
and the kairos-agent
service. Some stages for example are run by immucore, way before kairos-agent is even triggered, thus have nothing to do with the installation.
@jimmykarily Ahh, that makes sense. Thanks for the explanation.
Quick question for you. One of the problems we have is that if we use boot.systemctl
to start services they start when the Kairos installer starts and basically none of the services should be started until after install. My current solutions is to use service.timer
objects to delay start the services, but this is super wacky and delays the install and restart of the machines quite a bit.
I"ll go take a look to see if I can start services in another stage but I think I've tried this already and am wondering if you have any suggestions for a better fix than service timers.
@jimmykarily Ahh, that makes sense. Thanks for the explanation. Quick question for you. One of the problems we have is that if we use
boot.systemctl
to start services they start when the Kairos installer starts and basically none of the services should be started until after install. My current solutions is to useservice.timer
objects to delay start the services, but this is super wacky and delays the install and restart of the machines quite a bit. I"ll go take a look to see if I can start services in another stage but I think I've tried this already and am wondering if you have any suggestions for a better fix than service timers.
You can add a guard so the service do NOT start when you are on the install phase!
if: ! [ -f /run/cos/live_mode ]
See for example the installer service which only starts if we are on livecd/uki install mode AND if we are on a systemd system:
- if: |
([ -f /run/cos/live_mode ] || [ -f /run/cos/uki_install_mode ]) && \
( [ -e "/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] || [ -e "/usr/sbin/systemctl" ] || [ -e "/usr/bin/systemctl" ] )
commands:
- systemctl start kairos-webui
I assume here that you want to have those services on normal boot, not on live media/installer media right?
If you wanted for them to run on the liveCD then we can work around that as well by using the after-install
stage to run things :D
@Itxaka , you are awesome. That fixed it for me. Our nodes now build and restart 2 minutes faster with no issues. Thank you!!!
I'm thinking you can probably close this now if you all are satisfied.
Thank Dave!