sig-storage-local-static-provisioner Recover workloads using local PVs from data loss in the cloud

This is the track issue for some related problems in the cloud when the node is permanently deleted.

I'm thinking about writing a cloud controller to automate the recovery process. Here is the proposal: https://docs.google.com/document/d/1SA9epEwA3jPwibRV0ccQwJ2UfZXoeUYKyNxNegt0vn4

related issues:

https://github.com/kubernetes/kubernetes/issues/78756
https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/issues/201
https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/issues/181

Jun 19 '20 03:06 cofyc

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Sep 17 '20 04:09 fejta-bot

/remove-lifecycle stale

Sep 28 '20 09:09 Bessonov

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Dec 27 '20 10:12 fejta-bot

/remove-lifecycle stale

Dec 27 '20 10:12 Bessonov

@cofyc any further work past the initial proposal? /help /kind feature

Mar 03 '21 02:03 cdenneen

@cdenneen: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

@cofyc any further work past the initial proposal? /help /kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 03 '21 02:03 k8s-ci-robot

Wouldn't it already be enough as a first approximation to simply respect --pod-eviction-timeout as pased to kube-controller-manager (Which is 5 minutes by default)?

Just like pods get evicted after --pod-eviction-timeout after a node goes to Ready=Unknown or Ready=False, we can evict the PVC if the node is down that long.

If that timeout is good enough to try and evict the pod; then it's good enough to try and evict the PVC; is it not?

It does not handle the cases of node reboots losing data (e.g. in GKE and EKS) but it does handle the case of "Node permanently gone" both on on-prem and on managed services where each new spawned node gets a new name (E.g. EKS)

Edit: Ah wait this will not work; as kubernetes doesn't delete pods if the node isn't reachable. Only does so when the node manually gets deleted or is reachable

In that case; I guess a first approximation that works both on cloud and on-prem is to auto-delete PVs that are bound to nodes that do not exist anymore

Apr 14 '21 08:04 arianvp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jul 13 '21 08:07 fejta-bot

/remove-lifecycle stale

Jul 13 '21 17:07 Bessonov

Edit: Ah wait this will not work; as kubernetes doesn't delete pods if the node isn't reachable. Only does so when the node manually gets deleted or is reachable

In that case; I guess a first approximation that works both on cloud and on-prem is to auto-delete PVs that are bound to nodes that do not exist anymore

Exactly!

Aug 27 '21 18:08 cdenneen

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 25 '21 18:11 k8s-triage-robot

/remove-lifecycle stale

Nov 25 '21 19:11 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 23 '22 20:02 k8s-triage-robot

/remove-lifecycle stale

Feb 23 '22 20:02 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 24 '22 20:05 k8s-triage-robot

/remove-lifecycle stale

May 25 '22 12:05 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 23 '22 13:08 k8s-triage-robot

/remove-lifecycle stale

Aug 24 '22 11:08 Bessonov

sig-storage-local-static-provisioner sig-storage-local-static-provisioner copied to clipboard

Recover workloads using local PVs from data loss in the cloud

sig-storage-local-static-provisioner
sig-storage-local-static-provisioner copied to clipboard