local-path-provisioner icon indicating copy to clipboard operation
local-path-provisioner copied to clipboard

PVC stuck in pending when statefulset replicas greater than 1 on multinode MicroK8s

Open dick-twocows opened this issue 5 years ago • 1 comments

This does not happen on a single node MicroK8s

Deploying a statefulset with replicas greater than 3 results in the second PVC being stuck in pending. The nodes have shared RW access to an OCFS2 cluster with all nodes having a /data mount.

Prior to the logs of what is happening;

  • Is it possible to turn off node affinity for a PV when using the local path provisioner because with an OCFS2 cluster the /data mount is available to all nodes and thus the pod can be scheduler anywhere and will always have access to its data?

  • Is it the local path provisioner which adds the node affinity to the PV or is this done by the K8s control plane?

The first PVC will deploy successfully;

I0915 15:35:52.498970       1 controller.go:1202] provision "default/data-volume-mongodb-0" class "local-path": started
time="2020-09-15T15:35:52Z" level=debug msg="config doesn't contain node 192.168.108.180, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead" 
time="2020-09-15T15:35:52Z" level=info msg="Creating volume pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42 at 192.168.108.180:/data/local-path-provisioner/pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42_default_data-volume-mongodb-0" 
time="2020-09-15T15:35:52Z" level=info msg="create the helper pod create-pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42 into local-path-storage" 
I0915 15:35:52.506694       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-0", UID:"724b3ec5-3a25-4729-9908-0d7c4c242d42", APIVersion:"v1", ResourceVersion:"1583121", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-volume-mongodb-0"
time="2020-09-15T15:35:54Z" level=info msg="Volume pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42 has been created on 192.168.108.180:/data/local-path-provisioner/pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42_default_data-volume-mongodb-0" 
I0915 15:35:54.527321       1 controller.go:1284] provision "default/data-volume-mongodb-0" class "local-path": volume "pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42" provisioned
I0915 15:35:54.527348       1 controller.go:1301] provision "default/data-volume-mongodb-0" class "local-path": succeeded
I0915 15:35:54.527354       1 volume_store.go:212] Trying to save persistentvolume "pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42"
I0915 15:35:54.532183       1 volume_store.go:219] persistentvolume "pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42" saved
I0915 15:35:54.532635       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-0", UID:"724b3ec5-3a25-4729-9908-0d7c4c242d42", APIVersion:"v1", ResourceVersion:"1583121", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-724b3ec5-3a25-4729-9908-0d7c4c242d42

But the second PVC will be stuck in status pending;

ubuntu@test-1-1:~$ k get pvc
NAME                    STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
...
data-volume-mongodb-0   Bound     pvc-e17b2ec4-4d2a-4f1d-b294-08897412e678   10G        RWO            local-path     13h
data-volume-mongodb-1   Pending                                                                        local-path     13h
...

The operator pod logs shows;

I0915 15:36:57.434165       1 controller.go:1202] provision "default/data-volume-mongodb-1" class "local-path": started
time="2020-09-15T15:36:57Z" level=debug msg="config doesn't contain node test-1-1, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead" 
time="2020-09-15T15:36:57Z" level=info msg="Creating volume pvc-6c37d4ee-2684-402b-aad8-f59d240e49a3 at test-1-1:/data/local-path-provisioner/pvc-6c37d4ee-2684-402b-aad8-f59d240e49a3_default_data-volume-mongodb-1" 
time="2020-09-15T15:36:57Z" level=info msg="create the helper pod create-pvc-6c37d4ee-2684-402b-aad8-f59d240e49a3 into local-path-storage" 
I0915 15:36:57.439943       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-1", UID:"6c37d4ee-2684-402b-aad8-f59d240e49a3", APIVersion:"v1", ResourceVersion:"1583367", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-volume-mongodb-1"
W0915 15:38:57.728718       1 controller.go:893] Retrying syncing claim "6c37d4ee-2684-402b-aad8-f59d240e49a3" because failures 0 < threshold 15
E0915 15:38:57.728753       1 controller.go:913] error syncing claim "6c37d4ee-2684-402b-aad8-f59d240e49a3": failed to provision volume with StorageClass "local-path": failed to create volume pvc-6c37d4ee-2684-402b-aad8-f59d240e49a3: create process timeout after 120 seconds
I0915 15:38:57.728999       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-1", UID:"6c37d4ee-2684-402b-aad8-f59d240e49a3", APIVersion:"v1", ResourceVersion:"1583367", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "local-path": failed to create volume pvc-6c37d4ee-2684-402b-aad8-f59d240e49a3: create process timeout after 120 seconds
...
I0915 21:46:06.630545       1 controller.go:1202] provision "default/data-volume-mongodb-1" class "local-path": started
time="2020-09-15T21:46:06Z" level=debug msg="config doesn't contain node test-1-1, use DEFAULT_PATH_FOR_NON_LISTED_NODES instead" 
time="2020-09-15T21:46:06Z" level=info msg="Creating volume pvc-28ef4f45-f9ca-4c5b-9ff2-6ca116942892 at test-1-1:/data/local-path-provisioner/pvc-28ef4f45-f9ca-4c5b-9ff2-6ca116942892_default_data-volume-mongodb-1" 
time="2020-09-15T21:46:06Z" level=info msg="create the helper pod create-pvc-28ef4f45-f9ca-4c5b-9ff2-6ca116942892 into local-path-storage" 
I0915 21:46:06.633905       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-1", UID:"28ef4f45-f9ca-4c5b-9ff2-6ca116942892", APIVersion:"v1", ResourceVersion:"1614976", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-volume-mongodb-1"
E0915 21:48:06.890961       1 controller.go:896] Giving up syncing claim "28ef4f45-f9ca-4c5b-9ff2-6ca116942892" because failures 15 >= threshold 15
E0915 21:48:06.891048       1 controller.go:913] error syncing claim "28ef4f45-f9ca-4c5b-9ff2-6ca116942892": failed to provision volume with StorageClass "local-path": failed to create volume pvc-28ef4f45-f9ca-4c5b-9ff2-6ca116942892: create process timeout after 120 seconds
I0915 21:48:06.891406       1 event.go:281] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-volume-mongodb-1", UID:"28ef4f45-f9ca-4c5b-9ff2-6ca116942892", APIVersion:"v1", ResourceVersion:"1614976", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "local-path": failed to create volume pvc-28ef4f45-f9ca-4c5b-9ff2-6ca116942892: create process timeout after 120 seconds
ubuntu@test-1-1:~$ k get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS   REASON   AGE
pvc-1a9680ed-1793-4e27-aa46-07bdb27d2938   10G        RWO            Retain           Bound    default/data-zookeeper-0        local-path              5d23h
pvc-21befd0d-f207-4b0d-a724-3fa4e7ff88c2   10G        RWO            Retain           Bound    default/data-mysql-0            local-path              6d19h
pvc-2ccb9c5c-7cd3-4a84-9b3c-4889abee3084   10G        RWO            Retain           Bound    default/index-loki-0            local-path              5d13h
pvc-4663d494-4fe4-453d-8021-8c5a1d049535   10G        RWO            Retain           Bound    default/data-artemis-0          local-path              6d13h
pvc-9295f75d-2004-4849-9231-02f400d85c74   10G        RWO            Retain           Bound    default/datalog-zookeeper-0     local-path              5d23h
pvc-ad758220-2899-4d10-a5b9-5936da656264   10G        RWO            Retain           Bound    default/data-kafka-0            local-path              4d23h
pvc-b09301a5-b5a7-4bad-8731-646ed8db251d   10G        RWO            Retain           Bound    default/data-loki-0             local-path              5d18h
pvc-e17b2ec4-4d2a-4f1d-b294-08897412e678   10G        RWO            Retain           Bound    default/data-volume-mongodb-0   local-path              13h
pvc-ef22789b-8dd0-46dc-a708-81f1689e8c19   10G        RWO            Retain           Bound    default/chunks-loki-0           local-path              5d13h

dick-twocows avatar Sep 16 '20 08:09 dick-twocows

@dick-twocows It's just shown as a timeout from the daemon side. during the time, can you check the helper pod status and why it failed? You should able to see it in the local-path-storage namespace.

yasker avatar Oct 09 '20 23:10 yasker