kubernetes-csi-addons icon indicating copy to clipboard operation
kubernetes-csi-addons copied to clipboard

docs: healer controller design proposal

Open pkalever opened this issue 3 years ago • 6 comments

Signed-off-by: Prasanna Kumar Kalever <[email protected]>

pkalever avatar Jan 19 '22 13:01 pkalever

As discussed, had opened https://github.com/ceph/ceph-csi/issues/2794 for the VOLUME_CONDITION capability. Please comment below if we can split this into more granular parts.

pkalever avatar Jan 20 '22 06:01 pkalever

We still need to have a description of the procedures that a CSI-driver should implement for this functionality. That description needs to get reviewed and merged in the csi-addons/spec repository before a Kubernetes implementation can be proposed.

I had already opened https://github.com/ceph/ceph-csi/issues/2794, and will update the implementation details there soon.

I will also send the spec changes after some more clarity on the implementation side details(more on the internal structures). Maybe in parallel to the implementation PR.

This is just the overall design idea, for everyone's reference.

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

CSI-drivers can decide on their own implementation of healing, maybe they want to call their own NodeStageVolume procedure. Don't mix standard CSI procedures with CSI-Addons procedures.

Yes, definitely the CSI-Addons can implement its own RPC, just wanted to avoid another copy-paste.

pkalever avatar Jan 21 '22 09:01 pkalever

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

nixpanic avatar Jan 21 '22 12:01 nixpanic

CSI-Addons can not send CSI commands, those are reserved for the CO.

I'm a bit surprised to hear this @nixpanic , is it restricted anywhere? or at least documented? I definitely thought the sidecar can call NodeStageVolume() RPC in the nodeplugin pod.

Drivers (at least Ceph-CSI) offer different gRPC servers for different protocols, so it is technically prevented. Mixing protocols does not look like a good practice, or very clean.

If both the CO and CSI-Addons Controller send CSI operations, we break the spec https://github.com/container-storage-interface/spec/blob/master/spec.md#concurrency

A CSI-driver will need to take care that healing does not interfere with actions the CO does. The only practical way to do so, is by using different operations for CSI and CSI-Addons that can coordinate (block/abort/..) concurrent access.

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

pkalever avatar Jan 31 '22 11:01 pkalever

@Madhu-1 @nixpanic @Rakshith-R Can we finalize this, please? Let me know if we are missing any major calrifications in the design. If not, I hope we can discuss and address any nitty-gritty details in the implementation PR.

Thanks!

pkalever avatar Feb 02 '22 08:02 pkalever

@nixpanic IIUC, this means even sidecars calling NodeGetVolumeStats is bad?

Yes. CSI-Addons should not mix calling CSI operations. A CSI-driver will need to provide dedicated CSI-Addons operations that can be called instead. The CSI-driver is then in charge of making sure no conflicting calls from the CO are being processed.

nixpanic avatar Feb 02 '22 09:02 nixpanic

closing this as stale, feel free to reopen when required.

Madhu-1 avatar Sep 03 '24 09:09 Madhu-1