trident icon indicating copy to clipboard operation
trident copied to clipboard

Inaccurate TargetPortal in describe pv output with ONTAP SAN serving multiple networks

Open scaleoutsean opened this issue 3 years ago • 1 comments

Describe the bug

describe pv can show inaccurate TargetPortal with ONTAP SAN (iSCSI) which has data interfaces both reachable and unreachable networks. For example, if iSCSI Data LIFs are visible on 103.0/24 and 105.0/24 and I login from the latter network, describe pv will show that I'm using a Target from 103.0/24.

Maybe that's because Targets are sorted in ascending order and only the first Target is shown, but that's still wrong if I don't have any interfaces on that network.

Environment

  • Trident 21.01.1
  • CentOS 7.9
  • Container runtime: Docker 1.13.1
  • Kubernetes orchestrator: OpenShift v3.11 (free)

To Reproduce

  • Setup an SVM with iSCSI Data LIFs on 2 VLANs (I created 4 iSCSI Data LIFs, 1 per controller and VLAN so I end up with 2 for network 103.0/24 and 2 for network 105.0/24)
  • Setup SC for iSCSI and ONTAP SAN backend
  • From a worker on one of these networks/VLANs (e.g. VLAN 105, iSCSI network 192.168.105.0/24), when iscsiadm discovery is done, ONTAP reports 4 targets, iSCSI client attempts to log in to all 4, and succeeds with the two that it can access (those on 105.0/24, in my case, as it is on that VLAN). I know this is a separate topic, so let's just ignore that 103.0/24 is not reachable from this worker - worker will connect to the two target IPs on the network 105.0/24
  • Problem: describe pv shows inaccurate TargetPortal in output below:
$ oc describe pv default-postgresql-data-be191
Name:            default-postgresql-data-be191
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by=netapp.io/trident
                 volume.beta.kubernetes.io/storage-class=ontap-iscsi
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    ontap-iscsi
Status:          Bound
Claim:           default/postgresql-data
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:               ISCSI (an ISCSI Disk resource that is attached to a kubelet's host machine and then exposed to the pod)
    TargetPortal:       192.168.103.159     <===================  THIS  =========================
    IQN:                iqn.1992-08.com.netapp:sn.cfbbbeea862911eb895f005056a99e8f:vs.15
    Lun:                2
    ISCSIInterface      default
    FSType:             xfs
    ReadOnly:           false
    Portals:            [192.168.103.59 192.168.105.159 192.168.105.59]
    DiscoveryCHAPAuth:  false
    SessionCHAPAuth:    false
    SecretRef:          <nil>
    InitiatorName:      <none>
Events:                 <none>
  • Not only am I not using that Portal, I don't even have any interfaces on that network, I'm connected from 105.0/24 and can't ping Data LIFs from 103.0/24.
$ netstat -ant | grep 103
tcp        0      0 127.0.0.1:39103         0.0.0.0:*               LISTEN     

$ netstat -ant | grep 105
tcp        0      0 192.168.105.100:53      0.0.0.0:*               LISTEN     
tcp        0      0 192.168.105.100:60524   192.168.105.59:3260     ESTABLISHED
tcp        0      0 192.168.105.100:45898   192.168.105.159:3260    ESTABLISHED

$ ping 192.168.103.59
PING 192.168.103.59 (192.168.103.59) 56(84) bytes of data.
^C
--- 192.168.103.59 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms

  • I know that Portals in oc describe pv (which confusingly contains all 4 Target IPs, but at least that is correct albeit incomplete - showing 3 out of 4) comes from iscsiadm. But TargetPortal could show info from iscsiadm -m session (output below) which is accurate and doesn't have the misleading network 103.0/24:
$ sudo iscsiadm -m session
tcp: [1] 192.168.105.59:3260,1039 iqn.1992-08.com.netapp:sn.cfbbbeea862911eb895f005056a99e8f:vs.15 (non-flash)
tcp: [2] 192.168.105.159:3260,1040 iqn.1992-08.com.netapp:sn.cfbbbeea862911eb895f005056a99e8f:vs.15 (non-flash)

I don't know where TargetPortal in oc describe pv output comes from (some generic OS or CSI API or Trident), but if it's up to Trident I hope that information can be accurate.

Expected behavior

TargetPortal in describe pv output for iSCSI is accurate (at least show the correct network, even if just one IP of possibly several is shown).

scaleoutsean avatar Mar 23 '21 10:03 scaleoutsean