blob-csi-driver
blob-csi-driver copied to clipboard
CSI in-line volume setup intermittently fails with `config error in azstorage [account name not provided]`
What happened: When starting a pod with in-line CSI volume, it intermittently fails to mount with the error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 59s default-scheduler Successfully assigned default/jabba-image-vision-example-train-1710958708-tzr76 to aks-cpu-34807744-vmss000001
Warning FailedMount 24s (x7 over 59s) kubelet MountVolume.SetUp failed for volume "build-data" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 *** blobfuse2: A new version [2.2.1] is available. Consider upgrading to latest version for bug-fixes & new features. ***
Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output:
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.
After a few kubelet backoffs (no spec changes), it succeeds.
What you expected to happen: Volume setup should succeed the first time.
How to reproduce it: Create a deployment with large number of pods to increase the change of happening - perhaps 20+. Use something similar to this:
csi:
driver: blob.csi.azure.com
volumeAttributes:
azureStorageAuthType: MSI
azureStorageIdentityClientID: <clientID_here>
storageAccountName: mystorageaccount
containerName: mycontainer
protocol: fuse
mountOptions: -o allow_other --file-cache-timeout-in-seconds=120 --log-level=LOG_DEBUG --virtual-directory=false --streaming=true
Anything else we need to know?:
This is highly intermittent. For a large number of pods in a deployment, most of them succeed the first time. Others can take a few retries.
Environment:
- CSI Driver version: mcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.21.7
- Kubernetes version (use
kubectl version
): v1.26.12 - OS (e.g. from /etc/os-release): Ubuntu 22.04.4 LTS
- Kernel (e.g.
uname -a
): 6.2.0-1019-azure - Install tools:
- Others:
can you use storageAccount
instead of storageAccountName
in volumeAttributes
? @technicianted
@andyzhangx with storageAccount
instead of storageAccountName
, all pods fail to setup the volume. According to source code, it should be storageAccountName
:
Warning FailedMount 19s (x8 over 86s) kubelet MountVolume.SetUp failed for volume "output" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output:
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.
that's actually the same, you need to specify account name in secret:
kubectl create secret generic azure-secret --from-literal=azurestorageaccountname="xxx" -n pod-namespace
and then specify secretName: azure-secret
, that's a tricky part in pod inline volume:
volumeAttributes:
azureStorageAuthType: MSI
azureStorageIdentityClientID: <clientID_here>
storageAccountName: mystorageaccount
containerName: mycontainer
secretName: azure-secret
MSI does not need a secret. Note that it is intermittent, not consistently failing. Out of 200 pods, about 10 suffer from this problem. After a few backoffs they mostly succeed.
If secret is required it would have failed consistently.
no, you only need to specify azurestorageaccountname in the secret, that's the way pod inline volume to get the account name, that's for the sake of security.
That seems to have fixed it.
Few clarifying questions:
- Documentation clearly says that secret is not mandatory, and is only used to store the account secret. Please update documentation accordingly.
- The fact that when no
secretName
is specified it still works 90% of the time probably indicates a race condition bug in the volume setup. AddingsecretName
seems to just work around this potential bug.
Thanks for your help.
This problem is still happening with Kubernetes secrets but at a much lower rate. About once every 200 times. Code is still racy.
what's current error msg?
Same: [account name not provided]
@technicianted pls follow this guide to provide csi driver logs on the node: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed, and what's current pod config?