local-path-provisioner
local-path-provisioner copied to clipboard
provisioner doesn't like when nodes go away, VolumeFailedDelete
I'm running on some bare metal servers, and if one of them goes away (effectively permanently), PVs and PVCs don't get reaped, so the pods (created as a statefulset) can't recover.
This may be okay. Let me know if you'd like a reproducable example, or if it's a conceptual thing.
Here's an example I just ran across. It's a STS of Elasticsearch pods. Having persistent data is great, but if a server goes away, the pod just sits in purgatory.
$ kubectl describe pod esdata-4 -n kube-logging
...
Status: Pending
...
Mounts:
/var/lib/elasticsearch from esdata-data (rw,path="esdata_data")
Conditions:
Type Status
PodScheduled False
Volumes:
esdata-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: esdata-data-esdata-4
ReadOnly: false
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m43s (x45 over 39m) default-scheduler 0/5 nodes are available: 5 node(s) had volume node affinity conflict.
$ kubectl describe pv pvc-3628fa90-9e11-11e9-83ca-d4bed9ad776a
...
Annotations: pv.kubernetes.io/provisioned-by: rancher.io/local-path
Finalizers: [kubernetes.io/pv-protection]
StorageClass: local-path
Status: Released
Claim: kube-logging/esdata-data-esdata-4
Reclaim Policy: Delete
Node Affinity:
Required Terms:
Term 0: kubernetes.io/hostname in [DEADHOST]
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning VolumeFailedDelete 37s (x4 over 2m57s) rancher.io/local-path_local-path-provisioner-f7986dc46-cg8nl_7ff46af3-9e4f-11e9-a883-fa15f9dfdfe0 failed to delete volume pvc-3628fa90-9e11-11e9-83ca-d4bed9ad776a: failed to delete volume pvc-3628fa90-9e11-11e9-83ca-d4bed9ad776a: pods "delete-pvc-3628fa90-9e11-11e9-83ca-d4bed9ad776a" not found
I can delete it manually, just kubectl delete pv [pvid]. I then have to create the pv and pvc, also manually, before the pod is happy. I assumed there'd be a timeout reaping PVCs from dead nodes.
cc @tamsky in case he's come across this, as I see he's been around this repo.
Hi We are struggeling with the same problem.
How can I remove a specific node when I deploy stateful sets with this provisioner?
Also scaling down from 3 to 1 replica of our databases is not possible. I can only scale down to the node where the pod named *-0 runs.
Is there a way to manually move pv(c)s to other nodes?
How do you deal with node removal?
Moving PVC is not possible with this provisioner, since the whole concept with local path provisioner is the storage will be always on the node.
After the node is removed, you need to scale down the workload, delete the related PVC and PV, then scale it back to create a new replacement PVC/PV for it. That's assuming your application can rebuild the missing replica on the lost node.
If you want to move the data manually, you can copy the data out, recreate the PVC and PV with the same name, then use another pod to copy the data back, and finally start the workload. But this provisioner is not designed to be used in this way, since it mostly relies on the application replication function to keep the storage distributed across different nodes. It doesn't provide this function of it own.
If you want to have a solution to provider high-availability persistent storage to Kubernetes, you can take a look at Longhorn.
Thanks for clarification @yasker I understand that this is relying on the application for replication of the data and this makes sense for our use-case. We're using it for our distributed databases. It would just be nice to have a little bit more flexibility.
If I understood everything correctly I could scale down to a specific node I could
- scale down to 0
- copy the data out of the volume(s)
- delete the pvcs
- set node labels & node selector so that the only statefulset is scheduled to my preferred node
- scale up to 1 again
And you need to copy the data back once you scale up to 1. In fact at the scaleup step, you also need to
- (following your steps) scale up to 1 again
- This step will create a PVC for us to copy data to.
- scale down to 0
- Create another pod to use the same PVC
- Copy data into it
- Delete the pod but retain PVC
- Scale up the stateful set back to 1
Hi,
Can we auto-assign each cluster node to the pods when we scale -up using Statefulset instead of setting up node selector