external-resizer icon indicating copy to clipboard operation
external-resizer copied to clipboard

distributed resizing

Open pohly opened this issue 3 years ago • 20 comments

For local storage (e.g. LVM disks) it makes sense to support a deployment model where the driver only runs locally on nodes together with sidecars. This is already support for provisioning (kubernetes-csi/external-provisioner#524, https://github.com/kubernetes-csi/external-provisioner#deployment-on-each-node).

pohly avatar Mar 22 '21 15:03 pohly

In the resizer, the sidecar probably needs to check whether the volume is local before invoking the CSI driver. This can be achieved via the existing topology field. https://github.com/kubernetes-csi/external-provisioner/issues/592 might help to reduce traffic by configuring the informer so that it filters by label and thus only receives PVs that are of interest. See https://github.com/kubernetes-csi/external-provisioner/pull/583 for code which does such server-side filtering for CSIStorageCapacity objects.

pohly avatar Mar 22 '21 15:03 pohly

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 20 '21 16:06 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

fejta-bot avatar Jul 20 '21 17:07 fejta-bot

/remove-lifecycle rotten /lifecycle frozen

pohly avatar Aug 02 '21 08:08 pohly

How much effort is required for this?

travisghansen avatar Feb 22 '22 02:02 travisghansen

I suspect it'll be easier than distributed provisioning because the sidecar can already determine from the PV whether it is responsible for the operation. See https://github.com/kubernetes-csi/external-snapshotter/pull/585 for the corresponding feature in the snapshotter.

pohly avatar Feb 22 '22 08:02 pohly

I am not convinced that we need distributed resizing in external-resizer. I see a clear use case of why distributed provisioning is needed for local volumes but I don't see a similar use case for resizing. Compared to provisioning - resizing is a two step process and NodeExpandVolume always runs on the node (called by kubelet), so if overall capacity supports volume expansion then expansion can happen on the node without supporting distributed resizing.

Is there a use case, where both ControllerExpandVolume and NodeExpandVolume must happen on the node where volume is available - because there is no visibility of volume capacity at cluster level? If so - then does distributed resizing solves it? It seems like - if I expand a local volume to 100TB and capacity is not available on the node, I will be stuck anyways distributed resizing or not.

gnufied avatar Mar 01 '22 16:03 gnufied

A fair point.

My sense is that controllers generally have no knowledge or understanding of how the device is being used and is just expanding raw capacity while node operations (when necessary) are meant to deal with the specifics of ensuring the volume (generally a block device if node resizing is involved) capacity is readily accessible to the workload (generally resizing the fs as necessary). I prefer to keep these logical separations.

With that in mind, I have several common code paths that are used by both distributed and centralized deployments (my project currently has 17 distinct 'drivers') so I'd love to be able to keep those logically in the same buckets.

From a broader (than k8s) perspective, I think the 'deploy the controller alongside the node' workflow is clearly a goal of csi so, regardless of any perceived value/arguments/points I make above, making assumptions about what makes sense or not probably doesn't serve the interests of the spec in the long-term. The spec is interested in facilitating node+controller deployments jointly and part of controller deployments is resizing...by extension csi compliant COs should be capable of invoking resize operations on controllers deployed in such a fashion.

In short, IMO regardless of specific merits for my use-case, it seems a supported csi deployment style that should 'just work' with k8s and/or any other COs.

travisghansen avatar Mar 01 '22 17:03 travisghansen

The spec is interested in facilitating node+controller deployments jointly and part of controller deployments is resizing...by extension csi compliant COs should be capable of invoking resize operations on controllers deployed in such a fashion.

The spec does not say anything about node+controller deployment on same nodes. I am still unsure, what problems distributed resizing will solve that is not already solved by calling NodeExpandVolume on the node.

gnufied avatar Mar 01 '22 18:03 gnufied

The spec does not say anything about node+controller deployment on same nodes. I am still unsure, what problems distributed resizing will solve that is not already solved by calling NodeExpandVolume on the node.

Isn't that figure 2 and 3 here?: https://github.com/container-storage-interface/spec/blob/master/spec.md

Again, regardless of that, IMO the provisioner and snapshotter both support node+controller deployments. The fact that k8s has various clients/code-bases to invoke the grpc methods is an implementation details. If k8s supports node+controller it should support it fully and not selectively decide which grpc methods it will or will not invoke based on supposition about the merits/requirements of the feature by csi driver implementations from specific SPs.

travisghansen avatar Mar 01 '22 19:03 travisghansen

@gnufied: if the CSI driver only runs on the nodes, where would you run external-resizer?

If you only run one instance of it per cluster, then it has no CSI driver instance that it can call. Or is the expectation that a CSI driver then has an implementation that runs on the node and another implementation that runs alongside the external-resizer, with the second implementation not doing any actual work?

pohly avatar Mar 02 '22 08:03 pohly

Currently - if a CSI driver supports node-only volume expansion, the external-resizer does provide a no-op trivial resizer which is used for updating PV (kubelet can't update PVs) and rest of the volume expansion happens on kubelet via NodeExpandVolume.

gnufied avatar Mar 02 '22 14:03 gnufied

I agree that this seems to be sufficient for node-local storage. Is this documented sufficiently so that CSI driver developers can find out about it?

pohly avatar Mar 03 '22 10:03 pohly

While I agree it can be done (at least for my use-cases), I disagree with what I consider divergence from the spec. I don’t think it’s appropriate to selectively decide which controller methods will/will not be invoked in node+controller setups based on supposition.

Moving on, I’m happy to test the deployment scenario but I’m unsure what it would look like. Is this available with current releases or is some change still required to support the method? Do I deploy resizer with nodes or just 1 instance? Which args should I be using? etc

travisghansen avatar Mar 03 '22 14:03 travisghansen

I get the point about deploying controller side of the plugin on the node and not having any explicit control-plane associated with CSI driver. I need to think some more about it. I have scheduled this item for discussion on Monday's CSI implementation meeting. If you can manage, please join so as we can talk about it.

gnufied avatar Mar 03 '22 16:03 gnufied

Of course! Send over meeting details here or on slack.

travisghansen avatar Mar 03 '22 16:03 travisghansen

I don’t think it’s appropriate to selectively decide which controller methods will/will not be invoked in node+controller setups based on supposition.

My understanding is that the controller side will only be skipped if a CSI driver developer explicitly asks for it by deploying the CSI driver in a suitable way. That seems okay to me. How that works specifically I don't know (hence my question about documentation).

pohly avatar Mar 03 '22 19:03 pohly

https://kubernetes-csi.github.io/docs/volume-expansion.html documents the deployment workflow.

Esp. following bits:

The Kubernetes CSI development team maintains external-resizer Kubernetes CSI Sidecar Containers. This sidecar container implements the logic for watching the Kubernetes API for Persistent Volume claim edits and issuing ControllerExpandVolume RPC call against a CSI endpoint and updating PersistentVolume object to reflect new size.

This sidecar is needed even if CSI driver does not have EXPAND_VOLUME controller capability, in this case it performs a NO-OP expansion and updates PersistentVolume object. NodeExpandVolume is always called by Kubelet on the node.

gnufied avatar Mar 03 '22 20:03 gnufied

OK thanks for pointing out the doc!

IMO this furthers my points about the spec. Stating the perhaps obvious, the sidecar (to my knowledge) must be deployed as..a sidecar. It cannot be deployed standalone without the/a controller deployed along with it. That deployment story does not match any of the architectures from the spec. It's a sort of mashing together of 1+2 and/or 1+3 where 'node' hosts are running controller+node plugins and then a 'master' host is running a controller plugin. Or said differently, 'controller' plugin is running both on 'nodes' and 'master' hosts.

travisghansen avatar Mar 03 '22 21:03 travisghansen

This sidecar is needed even if CSI driver does not have EXPAND_VOLUME controller capability, in this case it performs a NO-OP expansion and updates PersistentVolume object.

In case when there's an error during expanding, shouldn't this cause update to fail? Controller deployed as deployment, can't check whether there's enough space to expand on different node. Leaving PV in expanded state even though expansion didn't happen is wrong in my view.

CSI Spec also requires specific errors to be returned when Controller isn't able to expand the volume, so no-op expansion is not in sync with the spec.

zimnx avatar Jun 01 '23 20:06 zimnx