talos icon indicating copy to clipboard operation
talos copied to clipboard

Error upgrading from 1.9.5 to 1.9.6

Open jseely opened this issue 6 months ago • 4 comments

Bug Report

I'm running into an issue during upgrade causing the following error failed to probe bootloader on upgrade: file does not exist

Description

See my current configurations here: https://github.com/jseely/talos-config The machine in question is Talos2

Logs

 user: warning: [2025-05-10T21:26:24.651751283Z]: [talos] task unmountSystemDiskBindMounts (1/1): starting                                                                                                                                                                                
 user: warning: [2025-05-10T21:26:24.651839283Z]: [talos] task unmountSystemDiskBindMounts (1/1): unmounting /system/state                                                                                                                                                                
 kern:  notice: [2025-05-10T21:26:24.651999283Z]: XFS (sdb5): Unmounting Filesystem 42fdfd20-7af3-4a1c-a83e-639df8846e8c                                                                                                                                                                  
 user: warning: [2025-05-10T21:26:24.674184283Z]: [talos] task unmountSystemDiskBindMounts (1/1): unmounting /var                                                                                                                                                                         
 kern:  notice: [2025-05-10T21:26:24.840656283Z]: XFS (sdb6): Unmounting Filesystem 6c3a0e60-1a9c-4e27-9e4c-44c7f3b6adb4                                                                                                                                                                  
 user: warning: [2025-05-10T21:26:24.865315283Z]: [talos] task unmountSystemDiskBindMounts (1/1): done, 213.561982ms                                                                                                                                                                      
 user: warning: [2025-05-10T21:26:24.865346283Z]: [talos] phase unmountBind (7/14): done, 213.618541ms                                                                                                                                                                                    
 user: warning: [2025-05-10T21:26:24.865358283Z]: [talos] phase unmountSystem (8/14): 2 tasks(s)                                                                                                                                                                                          
 user: warning: [2025-05-10T21:26:24.865377283Z]: [talos] task unmountStatePartition (2/2): starting                                                                                                                                                                                      
 user: warning: [2025-05-10T21:26:24.865415283Z]: [talos] task unmountEphemeralPartition (1/2): starting                                                                                                                                                                                  
 user: warning: [2025-05-10T21:26:24.865543283Z]: [talos] task unmountStatePartition (2/2): done, 165.911µs                                                                                                                                                                               
 user: warning: [2025-05-10T21:26:24.865575283Z]: [talos] task unmountEphemeralPartition (1/2): done, 169.595µs                                                                                                                                                                           
 user: warning: [2025-05-10T21:26:24.865601283Z]: [talos] phase unmountSystem (8/14): done, 244.182µs                                                                                                                                                                                     
 user: warning: [2025-05-10T21:26:24.865612283Z]: [talos] phase volumeFinalize (9/14): 1 tasks(s)                                                                                                                                                                                         
 user: warning: [2025-05-10T21:26:24.865631283Z]: [talos] task teardownLifecycle (1/1): starting                                                                                                                                                                                          
 user: warning: [2025-05-10T21:26:24.865977283Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "STATE", "phase": "ready -> closed", "location": "/dev/sdb5", "parentLocation": "/dev/sdb"}                          
 user: warning: [2025-05-10T21:26:24.866006283Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "EPHEMERAL", "phase": "ready -> closed", "location": "/dev/sdb6", "parentLocation": "/dev/sdb"}                      
 user: warning: [2025-05-10T21:26:24.866029283Z]: [talos] volume status {"component": "controller-runtime", "controller": "block.VolumeManagerController", "volume": "META", "phase": "ready -> closed", "location": "/dev/sdb4", "parentLocation": "/dev/sdb"}                           
 user: warning: [2025-05-10T21:26:24.866164283Z]: [talos] task teardownLifecycle (1/1): done, 531.599µs                                                                                                                                                                                   
 user: warning: [2025-05-10T21:26:24.866185283Z]: [talos] phase volumeFinalize (9/14): done, 573.766µs                                                                                                                                                                                    
 user: warning: [2025-05-10T21:26:24.866195283Z]: [talos] phase upgrade (10/14): 1 tasks(s)                                                                                                                                                                                               
 user: warning: [2025-05-10T21:26:24.866212283Z]: [talos] task upgrade (1/1): starting                                                                                                                                                                                                    
 user: warning: [2025-05-10T21:26:24.866232283Z]: [talos] task upgrade (1/1): performing upgrade via "factory.talos.dev/installer/60b42e4f2f1eaee545c2436154a21f67ad285e596c106a1fb8f827954a8ed391:v1.9.6"                                                                                
 user: warning: [2025-05-10T21:26:24.957519283Z]: 2025/05/10 21:26:28 running Talos installer v1.9.6                                                                                                                                                                                      
 user: warning: [2025-05-10T21:26:24.961308283Z]: 2025/05/10 21:26:28 system disk wipe on upgrade is not supported anymore, option ignored                                                                                                                                                
 user: warning: [2025-05-10T21:26:24.963150283Z]: 2025/05/10 21:26:28 running pre-flight checks                                                                                                                                                                                           
 user: warning: [2025-05-10T21:26:24.964415283Z]: 2025/05/10 21:26:28 host Talos version: v1.9.5                                                                                                                                                                                          
 user: warning: [2025-05-10T21:26:24.966966283Z]: 2025/05/10 21:26:28 host Kubernetes versions: kubelet: 1.32.3, kube-apiserver: 1.32.3, kube-scheduler: 1.32.3, kube-controller-manager: 1.32.3                                                                                          
 user: warning: [2025-05-10T21:26:24.966977283Z]: 2025/05/10 21:26:28 all pre-flight checks successful                                                                                                                                                                                    
 user: warning: [2025-05-10T21:26:24.989441283Z]: Error: failed to probe bootloader on upgrade: file does not exist

Environment

  • Talos version:
Client:
        Tag:         v1.9.5
        SHA:         undefined
        Built:       2025-03-12T13:12:47Z
        Go version:  go1.24.1
        OS/Arch:     linux/amd64
Server:
        NODE:        10.0.144.106
        Tag:         v1.9.5
        SHA:         d07f6daa
        Built:       
        Go version:  go1.23.7
        OS/Arch:     linux/amd64
        Enabled:     RBAC
  • Kubernetes version: 1.32.3
  • Platform: Bare metal

jseely avatar May 10 '25 21:05 jseely

Looks like Talos fails to find the bootloader, you should be using GRUB unless this is a Secure Boot system.

Can you please post output of talosctl get dv for that machine?

smira avatar May 13 '25 10:05 smira

➜  ~ talosctl get dv -n 10.0.144.106
NODE           NAMESPACE   TYPE               ID      VERSION   TYPE        SIZE     DISCOVERED   LABEL                                    PARTITIONLABEL
10.0.144.106   runtime     DiscoveredVolume   dm-0    1         disk        1.2 TB   luks                                                  
10.0.144.106   runtime     DiscoveredVolume   dm-1    1         disk        1.2 TB   luks                                                  
10.0.144.106   runtime     DiscoveredVolume   loop3   1         disk        74 MB    squashfs                                              
10.0.144.106   runtime     DiscoveredVolume   sda     1         disk        62 GB    iso9660      TALOS_V1_9_5                             
10.0.144.106   runtime     DiscoveredVolume   sdb     1         disk        299 GB   gpt                                                   
10.0.144.106   runtime     DiscoveredVolume   sdb1    1         partition   105 MB   vfat                                                  EFI
10.0.144.106   runtime     DiscoveredVolume   sdb2    1         partition   1.0 MB                                                         BIOS
10.0.144.106   runtime     DiscoveredVolume   sdb3    1         partition   1.0 GB                                                         BOOT
10.0.144.106   runtime     DiscoveredVolume   sdb4    1         partition   1.0 MB                                                         META
10.0.144.106   runtime     DiscoveredVolume   sdb5    1         partition   105 MB   xfs          STATE                                    STATE
10.0.144.106   runtime     DiscoveredVolume   sdb6    1         partition   298 GB   xfs          EPHEMERAL                                EPHEMERAL
10.0.144.106   runtime     DiscoveredVolume   sdc     1         disk        1.2 TB   lvm2-pv      4538c6-IEUU-lrn2-VOSJ-cXP0-s0RD-e9x0Ok   
10.0.144.106   runtime     DiscoveredVolume   sdd     1         disk        1.2 TB   lvm2-pv      mb1MFv-4ZLU-pY55-ttsM-muyk-vH5Z-do325z

Could it be mistakenly unmounting the wrong disk?

jseely avatar May 15 '25 13:05 jseely

What is strange is that BOOT partition is not detected as xfs (while it should be) with the default Talos install

smira avatar May 15 '25 14:05 smira

Yeah it looks like the other node with similar hardware picks up the filesystem type of the boot partition properly. I'll try a fresh install on that node and see if it fixes it.

➜  ~ talosctl get dv -n 10.0.144.105
NODE           NAMESPACE   TYPE               ID      VERSION   TYPE        SIZE     DISCOVERED   LABEL                                    PARTITIONLABEL
10.0.144.105   runtime     DiscoveredVolume   dm-0    1         disk        1.2 TB                                                         
10.0.144.105   runtime     DiscoveredVolume   dm-1    1         disk        1.2 TB                                                         
10.0.144.105   runtime     DiscoveredVolume   loop3   1         disk        74 MB    squashfs                                              
10.0.144.105   runtime     DiscoveredVolume   sdb     1         disk        299 GB   gpt                                                   
10.0.144.105   runtime     DiscoveredVolume   sdb1    1         partition   105 MB   vfat         EFI                                      EFI
10.0.144.105   runtime     DiscoveredVolume   sdb2    1         partition   1.0 MB                                                         BIOS
10.0.144.105   runtime     DiscoveredVolume   sdb3    1         partition   1.0 GB   xfs          BOOT                                     BOOT
10.0.144.105   runtime     DiscoveredVolume   sdb4    1         partition   1.0 MB   talosmeta                                             META
10.0.144.105   runtime     DiscoveredVolume   sdb5    1         partition   105 MB   xfs          STATE                                    STATE
10.0.144.105   runtime     DiscoveredVolume   sdb6    1         partition   298 GB   xfs          EPHEMERAL                                EPHEMERAL
10.0.144.105   runtime     DiscoveredVolume   sdc     1         disk        299 GB                                                         
10.0.144.105   runtime     DiscoveredVolume   sdd     1         disk        1.2 TB   lvm2-pv      eQcIiv-xZT3-aRub-gJmI-HWCq-p2he-BrHP2F   
10.0.144.105   runtime     DiscoveredVolume   sde     1         disk        1.2 TB   lvm2-pv      BFOZAj-wXSA-vmze-89LK-RK2d-wCwc-eNL5Tl   
10.0.144.105   runtime     DiscoveredVolume   sdf     1         disk        1.2 TB

jseely avatar May 20 '25 17:05 jseely

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Nov 17 '25 02:11 github-actions[bot]