talos icon indicating copy to clipboard operation
talos copied to clipboard

[bug] Talos configuration will apply the disks section before all devices are ready

Open krobertson opened this issue 10 months ago • 0 comments

Bug Report

Description

The Talos configuration for machine.disks gets applied on startup without ensuring all disks on the system are ready and available. Doing it beforehand can lead to the machine failing to configure, triggering it to reboot and end up in a boot loop.

In my case, I had attached a 60 bay JBOD to a node. On a regular boot, it saw all the disks just fine, but it was a slow initialization as it enumerated over them. Once I tried to configure the disks as mounts within Talos, the machine started panicing and went into a reboot loop.

I have another box with 12 drives that I was able to configure and mount just fine. They were using 3 different HBAs and 6 separate channels (2 channels per card). The 60 bay JBOD was connected all over a single channel to a single HBA.

Logs

image

Mentioned in Slack: https://taloscommunity.slack.com/archives/CMARMBC4E/p1711685935164829

Environment

  • Talos version: 1.6.7
  • Kubernetes version: 1.29.2
  • Platform: metal / amd64

krobertson avatar Mar 31 '24 03:03 krobertson