sig-storage-lib-external-provisioner
sig-storage-lib-external-provisioner copied to clipboard
Provisioner fails with "error syncing claim: node not found" after "final error received, removing pvc"
This is a follow-on from https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/issues/121.
We are still seeing "error syncing claim: node not found" with "final error received, removing pvc x from claims in progress" @songsunny made a fix for this, however, the fix does not handle the final/non-final error. So the fix can cause the PD to leak in this scenario: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/pull/139#discussion_r1131778146
/sig storage /kind bug
Let's continue the conversation here. @msau42 @jsafrane
If the provisioning has started, shouldn't the state changed to ProvisioningInBackground?
If it is ProvisioningInBackground and node is not found, we do not go down the code path: return ctrl.provisionVolumeErrorHandling(ctx, ProvisioningReschedule, err, claim, operation)
Instead we continue to the code:
err = fmt.Errorf("failed to get target node: %v", err)
ctrl.eventRecorder.Event(claim, v1.EventTypeWarning, "ProvisioningFailed", err.Error())
return ProvisioningNoChange, err
@sunnylovestiramisu that will work only when we expect that the missing node comes back online. Is this the case we're trying to solve here? Because if the node was permanently deleted, then the provisioner will end in an endless loop.
To fix the issue for all possible library users, the library should save node somewhere (PVC annotation?!) and give it to the provisioner, so it can retry until it gets a final success / error without getting the Node from the API server.
But a whole node in PVC annotations is really ugly. CSI provisioner will need just node name from it to get CSINode and driver's topology labels from it*. So... should there be an extra call, say ExtractNodeInfo
, where the provisioner would get whatever it wants from the SelectedNode (and perhaps even CSINode) and return a shorter string that the library would save in the PVC? Then the library could give the string to every Provision()
call until a final success/error. This is change of the library API. Is it worth the effort?
*) There is another question what should happen when CSINode is missing here: https://github.com/kubernetes-csi/external-provisioner/blob/3739170578f68aaf0594f631cae6d270bbfdc83e/pkg/controller/topology.go#L273
@jsafrane we are trying to solve the case that the provisioning did start already on the backend but then the node got deleted afterwards.
Question is, do you expect the node to come back and optimize for this case? That's manageable.
The node with the same node name will not come back if it gets deleted. But if a node with the same node name but a different UID may come back. Or if there is any other cases that I miss.
There's 2 main reasons why a Node about may disappear:
- The node got autoscaled down. In this case the node is not expected to come back.
- The node got pre-emped and recreated. The node may come back, and often it comes back quickly.
The original motivation for https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/issues/121 was for the autoscaling case where the node never comes back.
The original motivation for https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/issues/121 was for the autoscaling case where the node never comes back
That's the hard case. The provisioner lib then needs to store the whole node somewhere, so it can reconstruct it and give it to the provisioner in the next Provision()
calls, until it succeeds / fails with a final error.
Since storing the whole node is ugly, we could extend the provisioner API, so the provisioner can distill a shorter string from the Node and then the library would store it (in PVC annotations?) and give it to the provisioner in all subsequent Provision calls.
What happens if a node with the exact name but different uid comes back? Does this shorter string provide enough information for the provisioner to distinguish the node and decide what action to take?
Question is, if selectedNode
annotation is valid when someone deletes a node and creates a new one with the same name. Assuming yes, the new node is still the right one, then I think the shorter string could be used only as a fallback, when the Node object is not in the API server (or in informer).
But how do we know if a node will come back or not? How do we know if it will be recreated from symptoms? Let say we store the nodeName in annotations.
-
The node got autoscaled down. In this case the node is not expected to come back. <-- We will see apierrs.IsNotFound(err) to be true always. And we still have nodeName in annotations. What do we do next?
-
The node got pre-emped and recreated. The node may come back, and often it comes back quickly. <-- We will see apierrs.IsNotFound(err) to be true in a short period of time. And we still have nodeName in annotations. And we use that nodeName to skip the ctrl.provisionVolumeErrorHandling(ctx, ProvisioningReschedule, err, claim, operation)?
I don't think if it matters if the node will come back or not. What we return depends on if the operation returned final or non-final error. Whether or not the node exists or not determines how we retry in the non-final case.
And so the question is for the non-final case, should we retry the CreateVolume() call with the old node (Jan's proposal), or get a new node from the scheduler (What the fix in #139 was trying to do).
I think the challenge with getting a new node is what happens if the topology also changes? Then a fix would involve:
- The driver being able to handle a change in requested topology for an existing operation
- provisioner being able to detect that the returned topology doesn't match the request and coordinating a deletion + retry
And so the question is for the non-final case, should we retry the CreateVolume() call with the old node (Jan's proposal), or get a new node from the scheduler (What the fix in https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/pull/139 was trying to do).
What if there is no node? Someone has deleted the Pod and PVC and provisioning is in progress. The provisioning should continue and eventually succeed or fail, but it needs some node to continue.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten