blob-csi-driver icon indicating copy to clipboard operation
blob-csi-driver copied to clipboard

CSI in-line volume setup intermittently fails with `config error in azstorage [account name not provided]`

Open technicianted opened this issue 3 months ago • 13 comments

What happened: When starting a pod with in-line CSI volume, it intermittently fails to mount with the error:

Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    59s                default-scheduler  Successfully assigned default/jabba-image-vision-example-train-1710958708-tzr76 to aks-cpu-34807744-vmss000001
  Warning  FailedMount  24s (x7 over 59s)  kubelet            MountVolume.SetUp failed for volume "build-data" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 *** blobfuse2: A new version [2.2.1] is available. Consider upgrading to latest version for bug-fixes & new features. ***
Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output: 
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.

After a few kubelet backoffs (no spec changes), it succeeds.

What you expected to happen: Volume setup should succeed the first time.

How to reproduce it: Create a deployment with large number of pods to increase the change of happening - perhaps 20+. Use something similar to this:

  csi:
    driver: blob.csi.azure.com
    volumeAttributes:
      azureStorageAuthType: MSI
      azureStorageIdentityClientID: <clientID_here>
      storageAccountName: mystorageaccount
      containerName: mycontainer
      protocol: fuse
      mountOptions: -o allow_other --file-cache-timeout-in-seconds=120 --log-level=LOG_DEBUG --virtual-directory=false --streaming=true

Anything else we need to know?:

This is highly intermittent. For a large number of pods in a deployment, most of them succeed the first time. Others can take a few retries.

Environment:

  • CSI Driver version: mcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.21.7
  • Kubernetes version (use kubectl version): v1.26.12
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.4 LTS
  • Kernel (e.g. uname -a): 6.2.0-1019-azure
  • Install tools:
  • Others:

technicianted avatar Apr 08 '24 21:04 technicianted

can you use storageAccount instead of storageAccountName in volumeAttributes? @technicianted

andyzhangx avatar Apr 09 '24 13:04 andyzhangx

@andyzhangx with storageAccount instead of storageAccountName, all pods fail to setup the volume. According to source code, it should be storageAccountName:

  Warning  FailedMount  19s (x8 over 86s)  kubelet  MountVolume.SetUp failed for volume "output" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output: 
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.

technicianted avatar Apr 09 '24 14:04 technicianted

that's actually the same, you need to specify account name in secret:

kubectl create secret generic azure-secret --from-literal=azurestorageaccountname="xxx" -n pod-namespace

and then specify secretName: azure-secret, that's a tricky part in pod inline volume:

volumeAttributes:
      azureStorageAuthType: MSI
      azureStorageIdentityClientID: <clientID_here>
      storageAccountName: mystorageaccount
      containerName: mycontainer
      secretName: azure-secret

andyzhangx avatar Apr 09 '24 14:04 andyzhangx

MSI does not need a secret. Note that it is intermittent, not consistently failing. Out of 200 pods, about 10 suffer from this problem. After a few backoffs they mostly succeed.

If secret is required it would have failed consistently.

technicianted avatar Apr 09 '24 14:04 technicianted

no, you only need to specify azurestorageaccountname in the secret, that's the way pod inline volume to get the account name, that's for the sake of security.

andyzhangx avatar Apr 09 '24 14:04 andyzhangx

That seems to have fixed it.

Few clarifying questions:

  1. Documentation clearly says that secret is not mandatory, and is only used to store the account secret. Please update documentation accordingly.
  2. The fact that when no secretName is specified it still works 90% of the time probably indicates a race condition bug in the volume setup. Adding secretName seems to just work around this potential bug.

Thanks for your help.

technicianted avatar Apr 10 '24 17:04 technicianted

This problem is still happening with Kubernetes secrets but at a much lower rate. About once every 200 times. Code is still racy.

technicianted avatar Apr 30 '24 20:04 technicianted

what's current error msg?

andyzhangx avatar May 01 '24 03:05 andyzhangx

Same: [account name not provided]

technicianted avatar May 01 '24 03:05 technicianted

@technicianted pls follow this guide to provide csi driver logs on the node: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed, and what's current pod config?

andyzhangx avatar May 01 '24 03:05 andyzhangx