blob-csi-driver icon indicating copy to clipboard operation
blob-csi-driver copied to clipboard

Logs for blobfuse2 diverge from its source documentation

Open fabio-s-franco opened this issue 3 months ago • 12 comments

What happened: Had an issue (https://github.com/Azure/azure-storage-fuse/issues/1376) with HNS enabled blobfuse2 based mount when it is only accessible via private endpoint (public network access disabled). It made it quite difficult to workout the issue because there are no log files on the location documented by blobfuse2 project /var/log/blobfuse2.log.

In fact, the only way I could find any logs at all was to shell into the relevant node agent (csi-blob-node daemonset) and execute a mount command by hand. Other than that, there are no blobfuse2.log files available from within the blob container within the pod.

What you expected to happen: Find log files to debug issues on the documented location of /var/log/blobfuse2.log

How to reproduce it: Enable the managed driver via azure-cli or any other means in an AKS cluster. Version installed on my cluster has blobfuse2 2.1.2

  • AKS cluster must be private
  • Storage Account needs to have a private endpoint setup with subresource set to blob (Premium_ZRS sku)
  • Setup a storage class so it has the following mount option:
    "--use-adls=true",
    "-o allow_other",
    "-o attr_timeout=120",
    "--file-cache-timeout-in-seconds=120",
    "--use-attr-cache=true",
    "--cancel-list-on-mount-seconds=10",
    "-o entry_timeout=120",
    "-o negative_timeout=120",
    "--virtual-directory=true"

Anything else we need to know?:

The original issue was caused by the fact that blobfuse2 does not try to connect with standard (non HNS) endpoint fdqn, therefore the private endpoint naively setup using the blob as target subresource instead of dfs, will not be reachable as it will attempt to connect to *.dfs.core.windows.net instead of *.blob.core.windows.net, whose private DNS zone will not exist when the private endpoint was initially set to accommodate standard non HNS mounts. It is not clear and it is very difficult to debug that for the same storage account, if you can mount either HNS enabled or disabled containers, the former requires a private endpoint setup specifically for that purpose which resolves to a different fdqn than the latter. On top of that it does not seem to be possible to configure the endpoint to point to another fdqn for that case. Even setting AZURE_STORAGE_BLOB_ENDPOINT environment variable directly, does not influence the fdqn used by blobfuse2 to connect.

So, without any logs, it becomes nearly impossible to determine the root cause. What is documented on https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md will only output what is directly sent to stdout. For example:

Error: failed to initialize new pipeline [failed to authenticate credentials for azstorage]

This error does not help determine the problem, which can have several different underlying root causes (I alone, have experienced three different ones, for the same error message). Which I believe is an issue for the blobfuse2 to handle (better error output), but nonetheless it doesn't help that no log is persisted within the node agent. Moreover systemctl (or journalctl) command does not work on node agent pod as blobfuse2 is not managed by systemd. So the documentation is either inaccurate or not applicable to managed installation of CSI Blob driver / AKS setup I have.

Environment:

  • CSI Driver version: 1.22.5
  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.9", GitCommit:"d33c44091f0e760b0053f06023e87a1c99dfd302", GitTreeState:"clean", BuildDate:"2024-01-31T01:58:06Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 12 (bookworm)
  • Kernel (e.g. uname -a): 5.15.0-1058-azure #66-Ubuntu SMP Fri Feb 16 00:40:24 UTC 2024 x86_64 GNU/Linux
  • Install tools: azure-cli
  • Others: blobfuse2 2.1.2

fabio-s-franco avatar Apr 08 '24 09:04 fabio-s-franco