azurefile-csi-driver
azurefile-csi-driver copied to clipboard
driver matches private storage accounts without networkEndpointType set
While there is an existing storage account in Azure which is configured to use private endpoints and disallow public network access the driver can match this storage account even with networkEndpointType SC parameter unset. However it will not configure a private endpoint for worker subnet because networkEndpointType is not set to privateendpoint - this will cause volume mounts to fail on worker nodes.
This seems like a bug and the driver probably should not match "private" storage accounts if networkEndpointType parameter is omitted or set to "". Maybe there is some other parameter(s) for this scenario that we missed?
Driver version: v1.30.3
Reproducer:
- a prerequisite is to have an existing storage account with private endpoints and disallowed public network access (in the example below it's the
imageregistryrbednah6mbs)
Storage Class has networkEndpointType: "" (or omitted):
$ oc get sc/azurefile-csi-pub -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2024-08-30T11:35:13Z"
name: azurefile-csi-pub
resourceVersion: "55979"
uid: 0e152336-614b-41fc-aafc-3c0ffefd787a
mountOptions:
- mfsymlinks
- cache=strict
- nosharesock
- actimeo=30
parameters:
networkEndpointType: ""
skuName: Standard_LRS
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
Driver matches an existing storage account:
I0830 11:43:02.734943 1 utils.go:101] GRPC call: /csi.v1.Controller/CreateVolume
I0830 11:43:02.734967 1 utils.go:102] GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","parameters":{"csi.storage.k8s.io/pv/name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","csi.storage.k8s.io/pvc/name":"pvc-1","csi.storage.k8s.io/pvc/namespace":"default","networkEndpointType":"","skuName":"Standard_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{"mount_flags":["mfsymlinks","cache=strict","nosharesock","actimeo=30"]}},"access_mode":{"mode":7}}]}
I0830 11:43:03.097163 1 controllerserver.go:561] begin to create file share(pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2) on account(imageregistryrbednah6mbs) type(Standard_LRS) subID() rg(rbednar-az-01-8z785-rg) location() size(1) protocol(SMB)
I0830 11:43:03.383519 1 controllerserver.go:593] create file share pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 on storage account imageregistryrbednah6mbs successfully
I0830 11:43:03.399781 1 controllerserver.go:639] store account key to k8s secret(azure-storage-account-imageregistryrbednah6mbs-secret) in default namespace
I0830 11:43:03.399821 1 utils.go:108] GRPC response: {"volume":{"capacity_bytes":1073741824,"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2","csi.storage.k8s.io/pvc/name":"pvc-1","csi.storage.k8s.io/pvc/namespace":"default","networkEndpointType":"","secretnamespace":"default","skuName":"Standard_LRS"},"volume_id":"rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default"}}
The storage account matched is private:
$ az storage account show --name imageregistryrbednah6mbs --resource-group rbednar-az-01-8z785-rg --query "publicNetworkAccess" --output json
"Disabled"
With this configuration worker nodes can not mount any newly provisioned volumes as there is no private endpoint configured for the worker subnet:
I0830 11:51:25.588693 1 nodeserver.go:326] cifsMountPath(/var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount) fstype() volumeID(rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default) context(map[csi.storage.k8s.io/pv/name:pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 csi.storage.k8s.io/pvc/name:pvc-1 csi.storage.k8s.io/pvc/namespace:default networkEndpointType: secretnamespace:default skuName:Standard_LRS storage.kubernetes.io/csiProvisionerIdentity:1725012839930-7153-file.csi.azure.com]) mountflags([mfsymlinks cache=strict nosharesock actimeo=30]) mountOptions([mfsymlinks cache=strict nosharesock actimeo=30 file_mode=0777 dir_mode=0777]) volumeMountGroup()
E0830 11:51:25.606841 1 mount_linux.go:230] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
E0830 11:51:25.606924 1 utils.go:106] GRPC error: rpc error: code = Internal desc = volume(rbednar-az-01-8z785-rg#imageregistryrbednah6mbs#pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2###default) mount //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,cache=strict,nosharesock,actimeo=30,file_mode=0777,dir_mode=0777,<masked> //imageregistryrbednah6mbs.file.core.windows.net/pvc-f3da48fa-7952-4918-aedd-67c1c814b6e2 /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/ff63a5ad356eed4e6c55d99793032daf932997f18546669ac219a9df4571a26f/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.
NOTE: If I create SC with networkEndpointType=privateendpoint and provision a volume with it, the driver will set the private endpoint correctly on the existing storage account which fixes the mount issue for any other consecutive mounts that would reuse this storage account. This can be a temporary workaround.
Followup question: We have the above mentioned configuration in OpenShift private clusters where registry operator creates it's own storage resources including a (private) storage account and CSI provisioning happens after that. Even if this issue is resolved still the driver can in similar scenarios reuse "foreign" storage accounts and we're not sure if that's a good practice. Later, we would like to introduce a fix for this as well. Is there a recommended way to ensure the driver will never reuse a storage account not created by it? Could storageaccount or createaccount parameter be sufficient and safe solution?
if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType; if you only want to match networkEndpointType: privateEndpoint accounts, then you need to specify this parameter in storage class. @RomanBednar
if
networkEndpointTypeis unset, that means the csi driver would matching any account with or withoutnetworkEndpointType
@andyzhangx This is the actual issue, simplified. With this behavior we can not tell the driver to "avoid" private endpoint accounts. Either we request a private one specifically (predictable) or leave unset which means driver can match both private or public (unpredictable). And what we need is a way to reliably not match private endpoint accounts making it predictable.
@andyzhangx Do you think this could be considered a bug? We're looking for a fix that we could ship soon and it depends on the decision made for this issue.
I've found a fix that might work for now - if we set parameters to have a tag and require matching the tag, the driver will always create it's own storage account with first volume provisioned and then reuse it (unless there's one with a matching tag, which is unlikely yet still possible). With this we should be almost certain we don't match "foreign" storage accounts right? Or is it a bad idea?
if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType
I think that's a bug. It should match an usable account, not just any account. A private account + StorageClass with networkEndpointType: "" will lead to a PV that cannot be mounted today. Maybe there is another storage account that does not require private endpoints and would fit that CreateVolume call better.
if networkEndpointType is unset, that means the csi driver would matching any account with or without networkEndpointType
I think that's a bug. It should match an usable account, not just any account. A private account + StorageClass with
networkEndpointType: ""will lead to a PV that cannot be mounted today. Maybe there is another storage account that does not require private endpoints and would fit that CreateVolume call better.
@jsafrane CSI driver does not know whether the account is usable or not, it depends on the application pods, whether it's running on allowed vnet node. there is a parameter mapping rule: in general, if the value is not specified, the driver would just check the corresponding properties.
@RomanBednar I think best way to solve this issue is specifying networkEndpointType: privateEndpoint in sc though tag also works.
@andyzhangx I've tested that and it rather masks the problem and poses a few others: if we set networkEndpointType: privateEndpoint the driver will always configure a private endpoint on new storage account. And if there already is one that matched it will update it with private endpoint - but we don't know which account gets updated.
Then we should still allow "public" storage accounts for CSI driver to exist alongside arbitrary "private" ones (supported) and with this solution it would not be possible.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.