trident Nodes/PODs lost access to iSCSI based PVCs during / after upgrade to 25.02.1

Describe the bug Recently we have performed trident upgrade to 25.02.1 (from 24.10.1)

After upgrade was done by ‘operator’, and trident-node PODs were replaced – we faced tragic consequences – PODs lost access to bigger part of volumes (but not all of them). The same thing happened when we’ve upgraded from 24.10.0 -> 24.10.1

Before upgrade clusters were checked: all running PODs had access to all PVCs volumes, both paths under ‘multipath –ll’ were healthy for all dm-xxx devices, and no orphaned / ghosts block devices /dev/sdxx were present in the system before upgrade)

What we have observed was that after upgrade trident-node PODs were not able to fully start:

Main reason for that was that driver-registrar container was not able to fully start:

It seems that for unknown reason trident-main was not able to ’determine published paths for volumes’:

And it tried to remove them from multipath and unmount filesystem (active volumes of active PODs):

In the logs it is complaining that it was not able to flush multipath and Umount devices – technically it errors with ‘map or partition in use’ however PODs had lost access to blockdevices anyway ...

Error from console of one of the nodes:

Useful information is that on the trident-controller side there was no messages ‘Unpublishing volume from the node’ - so it was not removed on igroup / Netapp level but rather it seems like a bug in trident-node logic: it somehow decided that some volumes are not used (???) and must be unmounted / removed from OS

Environment

Trident version: upgrade from 24.10.1 to 25.02.1 (via operator)
Container runtime: docker://26.1.0
Kubernetes version: v1.30.6
Kubernetes orchestrator: rancher 2.10.3
OS: [Flatcar Container Linux by Kinvolk 4081.2.0 (Oklo)

Apr 14 '25 09:04 ptrkmkslv

Hi @ptrkmkslv We will need to collect additional logs to investigate this issue further. Can you please open a NetApp Support Case ?

Apr 22 '25 15:04 sjpeeris

Hi @ptrkmkslv . Has this issue been resolved ?

May 28 '25 12:05 sjpeeris