trident icon indicating copy to clipboard operation
trident copied to clipboard

Update SCSI scan command to improve compatibility

Open P0lskay opened this issue 11 months ago • 5 comments

Description:

When using the echo "0 0 -" > /sys/class/scsi_host/host0/scan command to scan for SCSI devices, we encounter an issue with Trident and other provisioners that utilize the scsi_host interface. Specifically, using this command, it becomes impossible to use Trident in conjunction with other provisioners that use scsi_host.

Problem:

The 0 0 - command scans only the devices connected to the specified controller number and target number, which limits the scope of the scan. However, this limitation causes issues with Trident and other provisioners.

Proposal:

To resolve this issue, we propose changing the scan command to echo "- - -" > /sys/class/scsi_host/host0/scan. This command scans all devices connected to the scsi_host interface, regardless of controller number and target number, which should allow Trident to coexist with other provisioners that use scsi_host. As an example, we can use ibm provisioner.

P0lskay avatar Jan 24 '25 16:01 P0lskay

We are waiting for changes in the next release.

P0lskay avatar Jan 27 '25 04:01 P0lskay

@P0lskay

Can you kindly elaborate on how "0 0 -" is affecting a) Trident and b) Other provisioners.

For Trident case, Can you please share more info? Is controller number different than 0 and target number different from 0?

In all our observed cases both are 0 and thus we are going with this specific scan rather than a blanket scan (i.e. - - -) Can you also kindly elaborate how this will impact the Trident's coexistence with other provisioners. Should the others provisioners be impacted if we go with blanket scan?

VinayKumarHavanur avatar Jan 27 '25 16:01 VinayKumarHavanur

@VinayKumarHavanur Right now, our clusters use ibm-block-csi-driver. When we install Trident and it try to mount the volume, we encounter the problem that the new volume is not found during the scsi scan. We found out that the problem is that the new device connects to a non-zero target number of scsi host. After executing the command 'echo "- - -" > /sys/class/scsi_host/host3/scan' our volume was found. Result of lscsl after executing the command:

`[3:0:0:5] disk IBM 2145 0000 /dev/sde

[3:0:0:23] disk IBM 2145 0000 /dev/sdf

[3:0:0:57] disk IBM 2145 0000 /dev/sdg

[3:0:0:76] disk IBM 2145 0000 /dev/sdh

[3:0:1:0] disk IBM 2145 0000 /dev/sdi

[3:0:1:4] disk IBM 2145 0000 /dev/sdj

[3:0:5:0] disk NETAPP 2145 0000 /dev/sdk`

You can see that the Netapp device has a target number of 5. I think we shoud just use another scanCmd: "- - {LUN ID}" (as an example, an ibm solution) In my PR #970 , someone said that they were aware of this problem and were working on it, but they probably misunderstood me, so I'm reopening the issue and PR. If you can, please make an approval of my PR

P0lskay avatar Jan 28 '25 18:01 P0lskay

@P0lskay

We are working on a change where we plan to scan with the targetID and LUN ID. Say, "- <tgt ID> <LUN ID>". We plan to avoid the scan across all the targets for a given LUN.

VinayKumarHavanur avatar Jan 29 '25 16:01 VinayKumarHavanur

@VinayKumarHavanur Okay, We will wait that. I hope that you will take into account that other provisions may work in the cluster.

P0lskay avatar Jan 30 '25 15:01 P0lskay