csm
csm copied to clipboard
[BUG]: add NVMeTCP connection parameter ctrl-loss-tmo=-1 to implement powerstore best practices
Bug Description
The Dell Linux host connectivity guide recommends on page 214 https://elabnavigator.dell.com/vault/pdf/Linux.pdf?key=1725374107988
By default, the Linux controller enters a reconnect state when it loses connection with the target. The default timeout for reconnecting is 10 minutes. However, a PowerStore node reboot may take more than 10 minutes. It is recommended to set ctrl-loss-tmo = -1 to keep the controller constantly reconnecting.
Per this SUSE documentation [https://documentation.suse.com/sles/15-SP5/html/SLES-all/cha-nvmeof.html] In case of a path loss, the NVMe subsystem tries to reconnect for a time period, defined by the ctrl-loss-tmo option of the nvme connect command
I'm concerned that this ctrl-loss-tmo = -1 parameter will be required for the NVMeTCP connection to reconnect to PowerStore nodes when performing a PowerStore NDU (non-disruptive code upgrade) where the PowerStore nodes reboot, one at a time, and during a code update, the nodes very well may be unavailable for longer than the default path timeout.
My novice reading of the code: nvmeTCPConnect function in gonvme_tcp_fc.go does not include this parameter
if duplicateConnect { exe = nvme.buildNVMeCommand([]string{NVMeCommand, "connect", "-t", "tcp", "-n", target.TargetNqn, "-a", target.Portal, "-s", NVMePort, "-D"}) } else { exe = nvme.buildNVMeCommand([]string{NVMeCommand, "connect", "-t", "tcp", "-n", target.TargetNqn, "-a", target.Portal, "-s", NVMePort}) }
If a change is needed; I also request that current supported CSI-powerstore driver builds be updated so that (for example) an OpenShift 4.14 environment using CSM-Operator 1.5.1 and CSI driver 2.10.1 can get this enhancement
Logs
no logs available ; see Dell SR 197072815
Screenshots
No response
Additional Environment Information
No response
Steps to Reproduce
Perform a PowerStore code upgrade / NDU from 3.6.0.0 to 3.6.1.2 for example with OpenShift attached using PVs
Expected Behavior
Hosts should be able to survive paths to storage going away and coming back during all normal data center operations
CSM Driver(s)
csi-powerstore 2.10.1
Installation Type
csm-operator 1.5.1
Container Storage Modules Enabled
No response
Container Orchestrator
OpenShift 4.14
Operating System
OpenShift Linux - RHCOS based on RHEL 9.2