linode-blockstorage-csi-driver icon indicating copy to clipboard operation
linode-blockstorage-csi-driver copied to clipboard

Kubernetes and Linode disagree about volume state

Open bhechinger opened this issue 6 years ago • 5 comments

Bug Reporting

Running a Kubernetes cluster and using persistent storage has inconsistency with the console/cli output for volumes. I have three volumes that Kubernetes show as Bound and Linode shows as Unattached.

Expected Behavior

For kubectl get pv and linode-cli volumes list to agree.

Actual Behavior

Some volumes which are in a Bound state in Kubernetes show as Unattached in Linode.

Steps to Reproduce the Problem

  1. Create Kubernetes Cluster
  2. Deploy pods that use PV/PVC
  3. Hope it happens

Environment Specifications

Kubernetes Version v1.15.3 CRICTL Version v1.15.0 CNI Version v0.8.2

Screenshots, Code Blocks, and Logs

$ k -n data-lake get pvc
NAME                                           STATUS   VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS           AGE
data-data-lake-consul-shared-consul-server-0   Bound    pvc-2a75786321a54dc7   10Gi       RWO            linode-block-storage   5d1h
data-data-lake-consul-shared-consul-server-1   Bound    pvc-c527988751224331   10Gi       RWO            linode-block-storage   5d1h
data-data-lake-consul-shared-consul-server-2   Bound    pvc-07349048faf54fc2   10Gi       RWO            linode-block-storage   5d1h
data-redis-shared-redis-ha-server-0            Bound    pvc-e7d2504da0174c4e   10Gi       RWO            linode-block-storage   4d5h
data-redis-shared-redis-ha-server-1            Bound    pvc-2ff4f633fd044b7f   10Gi       RWO            linode-block-storage   4d5h
data-redis-shared-redis-ha-server-2            Bound    pvc-7a2a92483c804a47   10Gi       RWO            linode-block-storage   4d5h
datadir-cockroachdb-shared-cockroachdb-0       Bound    pvc-6835ba194ace47f8   10Gi       RWO            linode-block-storage   4d5h
datadir-cockroachdb-shared-cockroachdb-1       Bound    pvc-71ad3db47d4d4b25   10Gi       RWO            linode-block-storage   4d5h
datadir-cockroachdb-shared-cockroachdb-2       Bound    pvc-306dce8bcd7a4c0b   10Gi       RWO            linode-block-storage   4d5h
datadir-consul-dev-0                           Bound    pvc-c7f54907a7ee4bd2   10Gi       RWO            linode-block-storage   5d1h
datadir-consul-dev-1                           Bound    pvc-7a9423257d7b4e71   10Gi       RWO            linode-block-storage   5d1h
datadir-consul-dev-2                           Bound    pvc-d9043b0bb18a45c5   10Gi       RWO            linode-block-storage   5d1h
$ k -n data-lake get pv
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                    STORAGECLASS           REASON   AGE
pvc-07349048faf54fc2   10Gi       RWO            Delete           Bound    data-lake/data-data-lake-consul-shared-consul-server-2   linode-block-storage            5d1h
pvc-2a75786321a54dc7   10Gi       RWO            Delete           Bound    data-lake/data-data-lake-consul-shared-consul-server-0   linode-block-storage            5d1h
pvc-2ff4f633fd044b7f   10Gi       RWO            Delete           Bound    data-lake/data-redis-shared-redis-ha-server-1            linode-block-storage            4d5h
pvc-306dce8bcd7a4c0b   10Gi       RWO            Delete           Bound    data-lake/datadir-cockroachdb-shared-cockroachdb-2       linode-block-storage            4d5h
pvc-6835ba194ace47f8   10Gi       RWO            Delete           Bound    data-lake/datadir-cockroachdb-shared-cockroachdb-0       linode-block-storage            4d5h
pvc-71ad3db47d4d4b25   10Gi       RWO            Delete           Bound    data-lake/datadir-cockroachdb-shared-cockroachdb-1       linode-block-storage            4d5h
pvc-7a2a92483c804a47   10Gi       RWO            Delete           Bound    data-lake/data-redis-shared-redis-ha-server-2            linode-block-storage            4d5h
pvc-7a9423257d7b4e71   10Gi       RWO            Delete           Bound    data-lake/datadir-consul-dev-1                           linode-block-storage            5d1h
pvc-c527988751224331   10Gi       RWO            Delete           Bound    data-lake/data-data-lake-consul-shared-consul-server-1   linode-block-storage            5d1h
pvc-c7f54907a7ee4bd2   10Gi       RWO            Delete           Bound    data-lake/datadir-consul-dev-0                           linode-block-storage            5d1h
pvc-d9043b0bb18a45c5   10Gi       RWO            Delete           Bound    data-lake/datadir-consul-dev-2                           linode-block-storage            5d1h
pvc-e7d2504da0174c4e   10Gi       RWO            Delete           Bound    data-lake/data-redis-shared-redis-ha-server-0            linode-block-storage            4d5h
$ linode-cli volumes list
┌───────┬─────────────────────┬────────┬──────┬────────────┬───────────┐
│ id    │ label               │ status │ size │ region     │ linode_id │
├───────┼─────────────────────┼────────┼──────┼────────────┼───────────┤
│ 42891 │ pvcc7f54907a7ee4bd2 │ active │ 10   │ us-central │           │
│ 42893 │ pvc7a9423257d7b4e71 │ active │ 10   │ us-central │           │
│ 42894 │ pvcd9043b0bb18a45c5 │ active │ 10   │ us-central │           │
│ 42895 │ pvc2a75786321a54dc7 │ active │ 10   │ us-central │ 15796976  │
│ 42896 │ pvc07349048faf54fc2 │ active │ 10   │ us-central │ 15796979  │
│ 42897 │ pvcc527988751224331 │ active │ 10   │ us-central │ 15796975  │
│ 42983 │ pvc682b97ea2b94467e │ active │ 10   │ us-central │           │
│ 42992 │ pvce7d2504da0174c4e │ active │ 10   │ us-central │ 15796976  │
│ 42993 │ pvc2ff4f633fd044b7f │ active │ 10   │ us-central │ 15796975  │
│ 42994 │ pvc7a2a92483c804a47 │ active │ 10   │ us-central │ 15796979  │
│ 42999 │ pvc6835ba194ace47f8 │ active │ 10   │ us-central │ 15796976  │
│ 43000 │ pvc71ad3db47d4d4b25 │ active │ 10   │ us-central │ 15796979  │
│ 43001 │ pvc306dce8bcd7a4c0b │ active │ 10   │ us-central │ 15796975  │
└───────┴─────────────────────┴────────┴──────┴────────────┴───────────┘

The ones without a linode_id show up as Unattached in the web console. One of them isn't actually attached to anything, but as you can see, three of them are.

Additional Notes


The Linode Community is a great place to get additional support.

bhechinger avatar Sep 03 '19 22:09 bhechinger

Thanks for reporting this. I saw a similar issue to this recently where volumes fail to get detached from a node and reattached to a target node when a pod changes nodes, causing downtime while the pod waits to come back up. Usually when this happens I get a flood of events that a volume has been detached, but only one saying that it's been attached. I'll try and look into this.

jnschaeffer avatar Sep 04 '19 17:09 jnschaeffer

I have yet to determine if they are actually properly mounted on the Kubernetes nodes but none of my apps seem to be mad at me so I'm assuming they are actually attached. I can verify on Friday when I have a minute.

bhechinger avatar Sep 04 '19 17:09 bhechinger

In my experience, the volumes do eventually get mounted where they should be, but a failure of a volume to hop nodes is less worrying than an outright mismatch between observed and expected state...hopefully it'll be clear once I actually start looking at the code.

jnschaeffer avatar Sep 04 '19 17:09 jnschaeffer

Is there a discrepancy in how persist_across_boot attached volumes are reported in volume lists?

displague avatar Sep 04 '19 19:09 displague

~the problem could be here in NodeUnstageVolume. It looks like we're attempting to unmount the volume again instead of detach. I'm going to work on a PR for this~ After some deeper investigation this doesn't appear to be the issue; the volume is detached by the controllerserver

thorn3r avatar Oct 04 '19 17:10 thorn3r

Sorry for this issue having sat here for so long, unattended.

If this issue is still valid, please feel free to re-open the issue.

nesv avatar Jul 11 '24 15:07 nesv