spec icon indicating copy to clipboard operation
spec copied to clipboard

Secrets feild addition to NodeGetVolumeStats Spec

Open humblec opened this issue 5 years ago • 14 comments

At present, there is no secret field available on NodeGetVolumeStats RPC, however this is required at times for specific storage vendor to map volumeID to the targetPath.

As per the spec:

Condition | gRPC Code | Description | Recovery Behavior
-- | -- | -- | --
Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified volume_id does not exist on specified volume_path. | Caller MUST verify that the volume_id is correct and that the volume is accessible on specified volume_path and has not been deleted before retrying with exponential back off.

targetpath contains the pvc name and volumeID is generated by plugin. The mapping between the targetpath to volumeID (may) require the secret/credentials. For example, in Ceph, we need secret to connect to the ceph cluster to get the volumeID from volume Name.

humblec avatar Jul 19 '19 07:07 humblec

/assign @gnufied

humblec avatar Jul 19 '19 07:07 humblec

@humblec if we have a valid targetPath - do we still need to verify if targetPath belongs to given volume ID? Does this apply to something like NodeUnpublishVolume too - which also does not have secrets in requst?

IIRC - @j-griffith was looking into a similar issue.

gnufied avatar Jul 23 '19 19:07 gnufied

@humblec could you please provide more detailed and specific information regarding this use case? It's not entirely clear what the issue is here or that it can't be solved without secrets (or why).

j-griffith avatar Aug 07 '19 17:08 j-griffith

To summarize, we'd like to understand:

  • Does Ceph for NodeGetVolumeStats actually require backend call (which would require a secret)?

saad-ali avatar Aug 07 '19 17:08 saad-ali

@humblec could you please provide more detailed and specific information regarding this use case? It's not entirely clear what the issue is here or that it can't be solved without secrets (or why).

I was under the assumption that this is the same problem that has been discussed here: https://github.com/container-storage-interface/spec/issues/370#issuecomment-519248111

j-griffith avatar Aug 07 '19 20:08 j-griffith

@humblec could you please provide more detailed and specific information regarding this use case? It's not entirely clear what the issue is here or that it can't be solved without secrets (or why).

I was under the assumption that this is the same problem that has been discussed here: #370 (comment)

The problem here is different, the issue was how to validate the given volume_id is the one that is mapped to volume_path and hence return stats or return a NOT_FOUND error.

Given a volume_id, we carry enough information encoded in the same, to detect which cluster it is, so we do not need parameters when volume_id's are passed (IOW, the where information in parameters is not needed when we have a volume_id). Which is the difference between this and the ListVolume/Snapshot requests. (as an aside, we still need secrets if we need to reach the storage backend for information)

Coming back to the secrets request here, we thought we needed it to validate parts of the volume_id and ensure that volume_path contains a mount of the exact volume_id. I think we now have an alternative, where we can store enough information about the backend volume being mounted when we execute the NodeStageVolume to match the volume_id and use the local mount tools to match the volume_path to the same. IOW, we can create a lookaside metadata to ensure we have the right mapping. This also ensures we save a network roundtrip to the storage server when validating the same.

We (as in the ceph-csi folks) would like to revisit this request post hashing out our alternatives as above. @humblec if you agree we can take this off this list and come back with the requirement or state that we do not require the same at a later date, thoughts?

ShyamsundarR avatar Aug 07 '19 21:08 ShyamsundarR

I'd recommend this issue be closed in that case. Can reopen if it's determined there's actually a problem to solve.

j-griffith avatar Aug 07 '19 21:08 j-griffith

@humblec if we have a valid targetPath - do we still need to verify if targetPath belongs to given volume ID? Does this apply to something like NodeUnpublishVolume too - which also does not have secrets in requst?>

@gnufied as per spec, when the RPC call gets in, we have to validate the targetPath against volumeID , to me this looks like a very corner case scenario, but opened this issue as it is in the spec.

To summarize, we'd like to understand: Does Ceph for NodeGetVolumeStats actually require backend call (which would require a secret)? >

@saad-ali actually we need it. IOW, may be soon, this is going to be the requirement from other storage vendors too or when they implement this RPC call. The justification what I see for this requirement is below.

The targetPath is composed by CO and passed to the SP/plugin. The targetpath contain PVC/POD information or in short it dont have direct reference/mapping to the VolumeID ( which Plugin created) . As per spec, we need to do the validation. This mapping or validation can be done in many ways or we can hack/work around. However thinking about the validation, first/obvious solution is figuring it out from the Storage backend from available information in the targetPath, but if we want to connect to the storage cluster and do the validation, it require the credentials/secret. I think this is going to be a general requirement for other storage vendors too. I mentioned about Ceph as an example. We were discussing about other alternatives too ( in Ceph) to have the metadata stored in targetPath somehow and use it for validation. But this looks to be a workaround/hack as we dont have secret option now in the RPC call.

So, I still think this is a requirement.

I was under the assumption that this is the same problem that has been discussed here: #370 (comment)

@j-griffith not really or this is different or specific to this RPC call :).

humblec avatar Aug 08 '19 06:08 humblec

There is an additional use-case for this, where having the secrets to connect to the storage backend is useful.

With Ceph-CSI we can create RBD images for volumeMode: Block. The block-device gets attached to the node, which makes it possible to check the size of the block-device. So far, so good.

However, RBD images are by default thin-provisioned, so not all space has been allocated on the Ceph cluster. It would be useful to be able to return the size of the image (not everything allocated), and check in the Ceph cluster (secrets required) how large the allocation is. These details can give an administrator of the Ceph cluster an idea how much over-provisioning is done.

nixpanic avatar Jan 08 '21 15:01 nixpanic

I am sort of becoming convinced that we may have to add secrets here. The other case I think is, this call is also being used to report volume health and hence it may be useful in many cases to have secret present so as volume health can be queried accurately.

Having said that - we have to also figure out how to carry this information to node side in kubelet. We started with just few secret fields in PV and now it looks like it is kind of exploding. :/

gnufied avatar Jan 08 '21 16:01 gnufied

I am sort of becoming convinced that we may have to add secrets here. The other case I think is, this call is also being used to report volume health and hence it may be useful in many cases to have secret present so as volume health can be queried accurately.

Having said that - we have to also figure out how to carry this information to node side in kubelet. We started with just few secret fields in PV and now it looks like it is kind of exploding. :/

True, now the secrets are too much , we do have 5/6 secrets already in place including controller,node and provisioner.. I would like to explore if there is a thought/idea which can be used to consolidate these heavy number of secrets in the spec. Would be difficult to have such mechanism considered we have to maintain backward compatibility, however may be we can try to handle internally or atleast think about a new way to store additionally introduced secrets..

humblec avatar Mar 17 '22 09:03 humblec

It looks like more CSI RPC calls require/wanting the secret field to be part of ( ex: https://github.com/container-storage-interface/spec/pull/462, #515..etc ) RPC calls. as mentioned in previous comment, It may be a good time to rethink about the possibility of accommodating such requests without making more fields to existing PV spec ...etc. , Inviting thoughts on this , so that we can draw the line for addressing such requirements in future and in a better way.

Cc @gnufied @jsafrane @bswartz @xing-yang @msau42

humblec avatar Jun 16 '22 09:06 humblec

Yes we should rethink secrets. The existing interface is too flexible and underspecified. The fact that secrets can vary per-namespace, per-volume, and per-RPC means that driver implementers have expansive flexibility, but it creates a burden that we have to store more secrets per volume as we add more RPCs, and it makes it unclear what to do for any RPC that involves more than 1 volume (like ListVolumes).

Given that we're stuck with the existing model for existing RPCs, probably the best we can do is to find out what exactly driver authors are doing with secrets and see if we can make the interface less flexible. If per-volume or per-RPC secrests are not really required maybe we can change the approach for new RPCs at least, and update existing RPCs to the do the same, and eventually deprecate (but not remove) the current way of handing secrets.

We also may want to do something to separate operational secrets from data encryption secrets. Right now the two use cases are both served by the same secrets field, but this creates awkward problems when you both need admin-supplied secrets for the SP to authenticate to the storage device and user-supplied secrets to manage data encryption.

bswartz avatar Jun 16 '22 13:06 bswartz

Yes we should rethink secrets.

Indeed!

Given that we're stuck with the existing model for existing RPCs, probably the best we can do is to find out what exactly driver authors are doing with secrets and see if we can make the interface less flexible. If per-volume or per-RPC secrests are not really required maybe we can change the approach for new RPCs at least, and update existing RPCs to the do the same, and eventually deprecate (but not remove) the current way of handing secrets.

Most of the time, or if we go with CSI Service model, RPCs having an option for secret looks to be a 'need', different storage systems may consume it differently though. However, fetching or having secret per RPC is also a burden if all these secrets find its space in the PV Spec. Considering CSI spec has been laid out in a such a way that, operations are targeted or meant for 3 main services, I feel secret fields can also be grouped from COs operation pov ( thus less in SC, PV Spec..etc) .. say one secret for controller, one for node. Then all operations on a service ( ex: controller) has to use the same, so node. Having separate or flexibility of specifying secrets per operation ( provisioner, expansion{controller and node}, snapshotter, volume stats{controller and node}..etc) looks to be overhead for me . Additionally, I am not sure is there really a use case to support separate secrets for each and every operation or RPC...etc.

We also may want to do something to separate operational secrets from data encryption secrets. Right now the two use cases are both served by the same secrets field, but this creates awkward problems when you both need admin-supplied secrets for the SP to authenticate to the storage device and user-supplied secrets to manage data encryption.

I like the idea of separating operational secrets from data path secrets. :+1:

However, I havent come across a situation of user-supplied secret requirement , that said, even data encryption has been managed by the admin persona and CSI driver internally without giving control to user in our case. But there could be a requirement for user supplied too.

humblec avatar Jun 20 '22 12:06 humblec