talos icon indicating copy to clipboard operation
talos copied to clipboard

LVM activation is not entirely working

Open jfroy opened this issue 1 year ago • 6 comments

Bug Report

Description

Talos 1.8.1 improved lvm activation quite a bit, but I am still seeing some number of VGs missing when booting a node. These VGs are encrypted ceph OSDs, on NVMe drives. The machine has a fair number of IO devices (10 NVMe, 15 SATA hard drives), so maybe it's a race or time out.

The issue manifests as missing /dev nodes for some VGs, usually the same 2. Issuing a vgchange --refresh command in a node-shell creates the missing nodes and allows the Ceph OSDs to start.

Logs

I see one such log entry for each of the 8 Ceph OSDs:

192.168.1.13: user: warning: [2024-10-15T23:59:03.258425978Z]: [talos] activating LVM volume {"component": "controller-runtime", "controller": "block.LVMActivationController", "name": "ceph-41785a27-5010-4418-becf-d6d52fd93f9c"}

So the controller is not missing any. There are no additional logs or differences in the log lines between the VGs that do get a device nodes and those that don't.

Environment

  • Talos version:
    Client:
        Tag:         v1.6.6
        SHA:         7dceba06
        Built:       
        Go version:  go1.21.8 X:loopvar
        OS/Arch:     darwin/arm64
    Server:
        NODE:        192.168.1.13
        Tag:         v1.8.1-jfroy.2
        SHA:         e398d8a0
        Built:       
        Go version:  go1.22.8
        OS/Arch:     linux/amd64
        Enabled:     RBAC
    
  • Kubernetes version: 1.31.1
  • Platform: baremetal amd64

jfroy avatar Oct 16 '24 05:10 jfroy

See https://github.com/siderolabs/talos/pull/9422 and https://github.com/siderolabs/talos/pull/9333 for recent work in this area.

jfroy avatar Oct 16 '24 05:10 jfroy

@jfroy as you seem to be running with your changes on top, just to make sure - do you have $PKGS matching v1.8.1, as there were several fixes in pkgs as well

smira avatar Oct 16 '24 09:10 smira

@jfroy as you seem to be running with your changes on top, just to make sure - do you have $PKGS matching v1.8.1, as there were several fixes in pkgs as well

I believe I am. I've rebased my release branches on yours (release-1.8) and rebuilt. But I do not build PKGS (with few exceptions, eg base and kernel), so I assume that you've rebuild upstream. Maybe the PKGS ref is stale in my forks, I will check.

But the behavior has improved, which suggests I do have the current fixes. I used to see LVM controller errors and get none of the VGs activated. Now the controller seems to work and the VGs all get activated. It feels like this current issue may be an interaction with udev somehow missing or dropping events or not being aware somehow of all the activated VGs.

I've seen the big changes to udev coming and maybe that will help. I can try to rebase that on 1.8 and test, though I don't know how mature the patches are.

Or I could be wrong and this is something else. I also want to go see what that refresh command does. Maybe it will suggest some further line of investigation.

jfroy avatar Oct 16 '24 12:10 jfroy

The way you described it I haven't seen while debugging this issue - the usual issue is that the detection logic falsely reports the volume as activated.

smira avatar Oct 16 '24 13:10 smira

I can't reproduce the issue on both main and Talos 1.8.1. I tested with 4 NVMe drives and Ceph with encryption enabled. On any reboot all 3 worker nodes came up with all LVM volumes activated.

172.20.0.7   runtime     DiscoveredVolume   dm-0      1         disk        11 GB    luks                                                  
172.20.0.7   runtime     DiscoveredVolume   dm-1      1         disk        11 GB    luks                                                  
172.20.0.7   runtime     DiscoveredVolume   dm-2      1         disk        11 GB    luks                                                  
172.20.0.7   runtime     DiscoveredVolume   dm-3      1         disk        11 GB    luks                                                  
172.20.0.7   runtime     DiscoveredVolume   dm-4      1         disk        11 GB    bluestore                                             
172.20.0.7   runtime     DiscoveredVolume   dm-5      1         disk        11 GB    bluestore                                             
172.20.0.7   runtime     DiscoveredVolume   dm-6      1         disk        11 GB    bluestore                                             
172.20.0.7   runtime     DiscoveredVolume   dm-7      1         disk        11 GB    bluestore                                             
172.20.0.7   runtime     DiscoveredVolume   loop0     1         disk        77 MB    squashfs                                              
172.20.0.7   runtime     DiscoveredVolume   nvme0n1   1         disk        11 GB    lvm2-pv      0PsH8G-afuZ-mS2J-A35D-FirN-Qpfd-OyTv2a   
172.20.0.7   runtime     DiscoveredVolume   nvme0n2   1         disk        11 GB    lvm2-pv      QFHt0k-ka8i-KX1t-kTGk-WWww-SGCK-GPpHZ8   
172.20.0.7   runtime     DiscoveredVolume   nvme0n3   1         disk        11 GB    lvm2-pv      K3nU4P-u2hf-6wAy-orfN-bSju-h0By-Upx64T   
172.20.0.7   runtime     DiscoveredVolume   nvme0n4   1         disk        11 GB    lvm2-pv      E64OOb-lK4c-Dzfi-xiGa-9bNC-vW4T-aeQhL7 

I wonder if we could dig more from talosctl support bundle.

smira avatar Oct 21 '24 14:10 smira

Thanks for looking into it some more. I will get a bundle done, and also investigate more on my own. I can definitely repro on that node in my cluster, so hopefully I can root cause.

jfroy avatar Oct 22 '24 02:10 jfroy

OK, turns out the root cause was pretty straight forward. The LVM controller can start activating volumes before udevd is running. Those volumes will not have their dev node created because that is performed by udevd in response to LVM activation events.

See the log below. Two activations happen before udevd is running. Those are the 2 volumes missing their node for that particular boot sequence.

192.168.1.13: user: warning: [2024-10-23T21:58:15.064749698Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "META", "phase": "waiting -> ready"}
192.168.1.13: user: warning: [2024-10-23T21:58:15.086447698Z]: [talos] task writeIMAPolicy (1/1): done, 166.680011ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.090304698Z]: [talos] phase integrity (2/12): done, 183.238453ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.094078698Z]: [talos] phase etc (3/12): 3 tasks(s)
192.168.1.13: user: warning: [2024-10-23T21:58:15.097845698Z]: [talos] task setUserEnvVars (3/3): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.101460698Z]: [talos] task setUserEnvVars (3/3): done, 3.630649ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.104960698Z]: [talos] task CreateSystemCgroups (1/3): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.108488698Z]: [talos] task createOSReleaseFile (2/3): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.112089698Z]: [talos] task CreateSystemCgroups (1/3): using cgroups root: /
192.168.1.13: user: warning: [2024-10-23T21:58:15.115654698Z]: [talos] task createOSReleaseFile (2/3): done, 14.329314ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.136053698Z]: [talos] activating LVM volume {"component": "controller-runtime", "controller": "block.LVMActivationController", "name": "ceph-c033ff07-8e3d-439b-94bf-2b8f94445df4"}
192.168.1.13: user: warning: [2024-10-23T21:58:15.156689698Z]: [talos] task CreateSystemCgroups (1/3): done, 58.826575ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.160161698Z]: [talos] phase etc (3/12): done, 66.085972ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.163461698Z]: [talos] phase earlyServices (4/12): 4 tasks(s)
192.168.1.13: user: warning: [2024-10-23T21:58:15.166679698Z]: [talos] task startContainerd (4/4): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.169868698Z]: [talos] service[containerd](Starting): Starting service
192.168.1.13: user: warning: [2024-10-23T21:58:15.172979698Z]: [talos] task startUdevd (1/4): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.176138698Z]: [talos] task startMachined (2/4): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.179168698Z]: [talos] task startSyslogd (3/4): starting
192.168.1.13: user: warning: [2024-10-23T21:58:15.182115698Z]: [talos] service[containerd](Preparing): Running pre state
192.168.1.13: user: warning: [2024-10-23T21:58:15.185049698Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "phase": "waiting -> failed", "error": "volume is encrypted, but no encryption config provided"}
192.168.1.13: user: warning: [2024-10-23T21:58:15.193783698Z]: [talos] service[containerd](Preparing): Creating service runner
192.168.1.13: user: warning: [2024-10-23T21:58:15.196659698Z]: [talos] service[udevd](Starting): Starting service
192.168.1.13: user: warning: [2024-10-23T21:58:15.199467698Z]: [talos] service[udevd](Preparing): Running pre state
192.168.1.13: user: warning: [2024-10-23T21:58:15.202199698Z]: [talos] service[syslogd](Starting): Starting service
192.168.1.13: user: warning: [2024-10-23T21:58:15.204872698Z]: [talos] service[containerd](Running): Process Process(["/bin/containerd" "--address" "/system/run/containerd/containerd.sock" "--state" "/system/run/containerd" "--root" "/system/var/lib/containerd"]) started with PID 1896
192.168.1.13: user: warning: [2024-10-23T21:58:15.212845698Z]: [talos] service[syslogd](Waiting): Waiting for service "machined" to be "up"
192.168.1.13: user: warning: [2024-10-23T21:58:15.215571698Z]: [talos] task startSyslogd (3/4): done, 38.144554ms
192.168.1.13: user: warning: [2024-10-23T21:58:15.218198698Z]: [talos] service[machined](Starting): Starting service
192.168.1.13: user: warning: [2024-10-23T21:58:15.220825698Z]: [talos] service[machined](Preparing): Running pre state
192.168.1.13: user: warning: [2024-10-23T21:58:15.223438698Z]: [talos] service[udevd](Preparing): Creating service runner
192.168.1.13: user: warning: [2024-10-23T21:58:15.226068698Z]: [talos] service[machined](Preparing): Creating service runner
192.168.1.13: user: warning: [2024-10-23T21:58:15.228763698Z]: [talos] service[udevd](Running): Process Process(["/sbin/udevd" "--resolve-names=never"]) started with PID 1902
192.168.1.13: user: warning: [2024-10-23T21:58:15.234298698Z]: [talos] service[machined](Running): Service started as goroutine
192.168.1.13: user: warning: [2024-10-23T21:58:15.296740698Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "phase": "failed -> failed", "error": "volume is encrypted, but no encryption config provided"}
192.168.1.13: user: warning: [2024-10-23T21:58:15.407836698Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "phase": "failed -> failed", "error": "volume is encrypted, but no encryption config provided"}
192.168.1.13: user: warning: [2024-10-23T21:58:15.454934698Z]: [talos] activating LVM volume {"component": "controller-runtime", "controller": "block.LVMActivationController", "name": "ceph-cd59b9e6-3943-4927-b7a5-c95d920a35ed"}
192.168.1.13: daemon:    info: [2024-10-23T21:58:15.513383698Z]: udevd[1902]: starting version 3.2.14
192.168.1.13: daemon:    info: [2024-10-23T21:58:15.520355698Z]: udevd[1902]: starting eudev-3.2.14

jfroy avatar Oct 24 '24 00:10 jfroy

Oh, thanks for digging into this, this should be easy to fix!

smira avatar Oct 24 '24 10:10 smira

@jfroy not totally related, but what is the NVMe hardware in your box?

(or talosctl get pcidevices)

I wonder how they come into existence before udevd starts (as it should load the required modules)

smira avatar Oct 24 '24 12:10 smira

@jfroy not totally related, but what is the NVMe hardware in your box?

(or talosctl get pcidevices)

I wonder how they come into existence before udevd starts (as it should load the required modules)

Ah, my kernel has most drivers built-in and not as modules. I arguably should modularize it a bit more, but the ordering issue should probably still be fixed.

I checked and indeed your kernel config has CONFIG_BLK_DEV_NVME=m and mine has CONFIG_BLK_DEV_NVME=y.

They are Micron 7300 Pro [8x].

jfroy avatar Oct 24 '24 14:10 jfroy

Ok, it explains why I can't reproduce it, but we'll still get a fix for the next 1.8.2, thank you!

smira avatar Oct 24 '24 15:10 smira