ibm-spectrum-scale-csi icon indicating copy to clipboard operation
ibm-spectrum-scale-csi copied to clipboard

PV was not deleted but fileset was deleted in error case if owning cluster is unhealth

Open gandhisanjayv opened this issue 5 years ago • 4 comments

Describe the bug Deleted two PVCS's while owning cluster was unhealthy. FS was unmounted due cluster was not in quorum. problem was fixed after few minutes. It deleted PVC's, it got deleted but PV's were not deleted, i see filesets are deleted.

state before delete

NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     AGE
pvc-gpfstest-replica   Bound    pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49   100Gi      RWX            ibm-spectrum-scale-csi-fileset   4d18h
pvc-stress-ng          Bound    pvc-0f4d734e-dda9-49d6-8706-1d7947c87426   100Gi      RWX            ibm-spectrum-scale-csi-fileset   4d17h

mmlsfileset fs1
Filesets in file system 'fs1':
Name                     Status    Path
root                     Linked    /var/gpfs/fs1
cnss-demo-fset1          Linked    /var/gpfs/fs1/cnss-demo-fset1
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284 Linked /var/gpfs/fs1/pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284
pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1 Linked /var/gpfs/fs1/pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1
pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49 Linked /var/gpfs/fs1/pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49
pvc-0f4d734e-dda9-49d6-8706-1d7947c87426 Linked /var/gpfs/fs1/pvc-0f4d734e-dda9-49d6-8706-1d7947c87426

State after delete

 oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                STORAGECLASS                     REASON   AGE
pvc-0f4d734e-dda9-49d6-8706-1d7947c87426   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/pvc-stress-ng          ibm-spectrum-scale-csi-fileset            4d18h
pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/pvc-gpfstest-replica   ibm-spectrum-scale-csi-fileset            4d19h
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/scale-fset-pvc         ibm-spectrum-scale-csi-fileset            7d19h
registry-storage                           200Gi      RWX            Recycle          Bound      openshift-image-registry/image-registry-storage

mmlsfileset fs1
Filesets in file system 'fs1':
Name                     Status    Path
root                     Linked    /var/gpfs/fs1
cnss-demo-fset1          Linked    /var/gpfs/fs1/cnss-demo-fset1
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284 Linked /var/gpfs/fs1/pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284
pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1 Linked /var/gpfs/fs1/pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1

To Reproduce Steps to reproduce the behavior:

  1. create dynamic fileset PVCs on a remote cluster
  2. inject error-> keep GUI node up in remote cluster but shutdown majority quorum nodes so that FS is unmounted.
  3. delete pvcs
  4. fix remote cluster issue after few minutes -> start all quorum nodes

Expected behavior PV's should be deleted if fileset is deleted.

Environment Please run the following an paste your output here:

 oc version
Client Version: 4.5.4
Server Version: 4.5.4
Kubernetes Version: v1.18.3+012b3ec


GPFS version of remote cluster
 mmdiag
Current GPFS build: "5.0.5.2 ".
Built on Aug  3 2020 at 21:11:03
Running 1 day 19 hours 13 minutes 35 secs, pid 3367

Container ID:  cri-o://59c926379040b71f6aec5ef5ee9316bbb79d361a499f263ca65e45618b0bc161
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:dev

  csi-provisioner:
    Container ID:  cri-o://d2bcebce1dfb75cead653512cc3d7a4a452d17c1b4aadadd57455334d0449ff3
    Image:         quay.io/k8scsi/csi-provisioner:v1.5.0
    Image ID:      quay.io/k8scsi/csi-provisioner@sha256:e10aab64506dd46

# Deployment

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

gandhisanjayv avatar Aug 19 '20 17:08 gandhisanjayv

debug data is in /u/DUMPS/git-csi-issue296/

gandhisanjayv avatar Aug 19 '20 17:08 gandhisanjayv

@gandhisanjayv does the logs gets deleted after sometime ? The /u/DUMPS/git-csi-issue296/ does not seem to exist on glogin10.

deeghuge avatar Feb 15 '21 08:02 deeghuge

@gandhisanjayv is this issue still reproducible ?

Jainbrt avatar Mar 11 '22 08:03 Jainbrt

@gandhisanjayv could you please add Customer Impact & Customer probability labels to the issue?

Jainbrt avatar Apr 05 '22 17:04 Jainbrt