bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

When launching on real hardware, Metal doesn't provide informative error messages

Open stockholmux opened this issue 2 years ago • 4 comments

Image I'm using:

Metal, 1.8.0, x86_64

What I expected to happen: When booting, I would expect an attached display to show messages that inform me of status - did it boot? am I missing something?

What actually happened:

The attached display will give status messages from some boot processes but then it just hangs.

How to reproduce the problem:

Take an image, put it on a device, do no configuration of user-data.toml and net.toml.

Note I think that bottlerocket doesn't like the lack of net.toml, paired with the lack of tty configuration inuser-data.toml means that the messages aren't getting to the display

stockholmux avatar Jul 21 '22 14:07 stockholmux

Thanks for the report!

A few questions for my info:

  • How was the Bottlerocket image provisioned? Via EKS-A or by hand?
  • Could you provide any output of what you do actually see on the screen? There should be some output from the kernel at the very least.
  • It sounds like the image was booted without a net.toml or Boot configuration data (bootconfig.data). Is that correct?

Answers to the above questions will help, but I'm pretty certain about what's happening here. I'm guessing you're seeing kernel messages on the screen, but nothing from systemd, which is where the most informative messages are in the Bottlerocket boot process. Output from the kernel and systemd is handled via console devices typically specified on the kernel command line (more on the Bottlerocket side of this below). That being said and in the absence of console configuration, the kernel is smart enough to output to the first device capable of acting as a console (kernel docs). Often this is a local VGA device, etc.

If no console device is specified, the first device found capable of
acting as a system console will be used. At this time, the system
first looks for a VGA card and then for a serial port. So if you don't
have a VGA card in your system the first serial port will automatically
become the console.

Unfortunately systemd doesn't have that capability. systemd by default writes its output to /dev/console which uses the last console device specified on the kernel command line.

Bottlerocket images for metal don't configure any console devices for kernel or systemd output by default. The reason for this is that real hardware is so varied there isn't really a good default value we could provide. Console devices are configured via kernel command line; you can see the defaults for our AWS variants here. However, if no console devices are set, /dev/console will be bound to the same console device the kernel is using, meaning that systemd output should be seen on that output.

To configure console devices in Bottlerocket metal images, we use a cool kernel feature called Boot Configuration. This feature allows you to provide kernel command line arguments at runtime. The docs for this feature are provided in the PROVISIONING.md, but the tl;dr is: create a file with your console devices specified, run a Bottlerocket-provided linux tool (bootconfig) against that config to create a file which you will then provision to your machine alongside user-data.toml and net.toml.

All that being said we do want to make this experience easier and more clear for users. I'm currently investigating how we can get systemd output to the screen in the absence of a configured console device. (It's a common issue, see https://github.com/systemd/systemd/issues/9899).

zmrow avatar Jul 21 '22 16:07 zmrow

I would also be curious to see what /proc/consoles contains on the machine in question... If it actually finished booting and a user could get to a configured admin container we could check. But that's quite a few ifs... :)

zmrow avatar Jul 21 '22 20:07 zmrow

I provisioned from EKS-A without any customizations or added files. Here's the output I saw on screen.

I'm going to try again with the required config files, but this was my default experience. image

rothgar avatar Jul 22 '22 16:07 rothgar

Following up on this:

I found that EKS-A has default console settings: console=tty0 console=ttyS0,115200n8. That means that when you provisioned with EKS-A without any customization all systemd output was going to ttyS0, not your local monitor.

I've opened this PR which removes these default console settings. Additional information is provided in that PR, but by removing these console settings, you should by default get all kernel and systemd output to your local monitor. Additional configuration is possible, but if you're using a local monitor you won't need to explicitly configure anything!

zmrow avatar Aug 05 '22 21:08 zmrow

Closing this out as the EKS-A PR has been merged!

The next EKS-A release should have these changes included.

zmrow avatar Aug 11 '22 19:08 zmrow