sig-storage-local-static-provisioner icon indicating copy to clipboard operation
sig-storage-local-static-provisioner copied to clipboard

Recover workloads using local PVs from data loss in the cloud

Open cofyc opened this issue 4 years ago • 18 comments

This is the track issue for some related problems in the cloud when the node is permanently deleted.

I'm thinking about writing a cloud controller to automate the recovery process. Here is the proposal: https://docs.google.com/document/d/1SA9epEwA3jPwibRV0ccQwJ2UfZXoeUYKyNxNegt0vn4

related issues:

  • https://github.com/kubernetes/kubernetes/issues/78756
  • https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/issues/201
  • https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/issues/181

cofyc avatar Jun 19 '20 03:06 cofyc

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Sep 17 '20 04:09 fejta-bot

/remove-lifecycle stale

Bessonov avatar Sep 28 '20 09:09 Bessonov

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Dec 27 '20 10:12 fejta-bot

/remove-lifecycle stale

Bessonov avatar Dec 27 '20 10:12 Bessonov

@cofyc any further work past the initial proposal? /help /kind feature

cdenneen avatar Mar 03 '21 02:03 cdenneen

@cdenneen: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

@cofyc any further work past the initial proposal? /help /kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 03 '21 02:03 k8s-ci-robot

Wouldn't it already be enough as a first approximation to simply respect --pod-eviction-timeout as pased to kube-controller-manager (Which is 5 minutes by default)?

Just like pods get evicted after --pod-eviction-timeout after a node goes to Ready=Unknown or Ready=False, we can evict the PVC if the node is down that long.

If that timeout is good enough to try and evict the pod; then it's good enough to try and evict the PVC; is it not?

It does not handle the cases of node reboots losing data (e.g. in GKE and EKS) but it does handle the case of "Node permanently gone" both on on-prem and on managed services where each new spawned node gets a new name (E.g. EKS)

Edit: Ah wait this will not work; as kubernetes doesn't delete pods if the node isn't reachable. Only does so when the node manually gets deleted or is reachable

In that case; I guess a first approximation that works both on cloud and on-prem is to auto-delete PVs that are bound to nodes that do not exist anymore

arianvp avatar Apr 14 '21 08:04 arianvp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jul 13 '21 08:07 fejta-bot

/remove-lifecycle stale

Bessonov avatar Jul 13 '21 17:07 Bessonov

Edit: Ah wait this will not work; as kubernetes doesn't delete pods if the node isn't reachable. Only does so when the node manually gets deleted or is reachable

In that case; I guess a first approximation that works both on cloud and on-prem is to auto-delete PVs that are bound to nodes that do not exist anymore

Exactly!

cdenneen avatar Aug 27 '21 18:08 cdenneen

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 25 '21 18:11 k8s-triage-robot

/remove-lifecycle stale

Bessonov avatar Nov 25 '21 19:11 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 23 '22 20:02 k8s-triage-robot

/remove-lifecycle stale

Bessonov avatar Feb 23 '22 20:02 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 24 '22 20:05 k8s-triage-robot

/remove-lifecycle stale

Bessonov avatar May 25 '22 12:05 Bessonov

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 23 '22 13:08 k8s-triage-robot

/remove-lifecycle stale

Bessonov avatar Aug 24 '22 11:08 Bessonov