adlfs icon indicating copy to clipboard operation
adlfs copied to clipboard

Example missing

Open robinaly opened this issue 2 years ago • 6 comments

Hello,

I am used to authenticate to my azure blob storage account using the following code:

from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient

default_credential = DefaultAzureCredential()
account_url = 'https://test.blob.core.windows.net'

blob_service_client = BlobServiceClient(account_url, credential=default_credential)
container_client = blob_service_client.get_container_client("test")

I searched at many places but I can't find a concrete example of how to, e.g., list the files of the container test using the adlfs library.

The latest I tested was this code (using previous variables):

fs = adlfs.AzureBlobFileSystem(account_url=account_url, credentials=default_credential, anon=False)

Any help would be appreciated.

kind regards Robin

robinaly avatar Mar 31 '22 18:03 robinaly

Note that I meanwhile discovered this code snippet in the comments but the corresponding call doesn't work for me:

    Authentication with DefaultAzureCredential
    >>> abfs = AzureBlobFileSystem(account_name="XXXX", anon=False)
    >>> abfs.ls('')

robinaly avatar Apr 01 '22 07:04 robinaly

Can you share the trace back?

hayesgb avatar Apr 01 '22 10:04 hayesgb

Here it is

from azure.identity import DefaultAzureCredential
import adlfs
default_credential=DefaultAzureCredential()
abfs = adlfs.AzureBlobFileSystem(account_name="XXX", anon=False)
abfs.ls('')

output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/adlfs/spec.py", line 757, in ls
    files = sync(
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 65, in sync
    raise return_result
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/adlfs/spec.py", line 815, in _ls
    containers = [c async for c in contents]
  File "/Users/robin.aly/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/adlfs/spec.py", line 815, in <listcomp>
    containers = [c async for c in contents]
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/azure/core/async_paging.py", line 154, in __anext__
    return await self.__anext__()
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/azure/core/async_paging.py", line 157, in __anext__
    self._page = await self._page_iterator.__anext__()
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/azure/core/async_paging.py", line 99, in __anext__
    self._response = await self._get_next(self.continuation_token)
  File "/UsersXXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/azure/storage/blob/aio/_models.py", line 60, in _get_next_cb
    process_storage_error(error)
  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/fast-heat-detection-kty-UEtt-py3.9/lib/python3.9/site-packages/azure/storage/blob/_shared/response_handlers.py", line 181, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "<string>", line 1, in <module>
azure.core.exceptions.ClientAuthenticationError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:aa5e1056-501e-0056-63d0-457e49000000
Time:2022-04-01T13:58:21.2085578Z
ErrorCode:AuthenticationFailed
authenticationerrordetail:Signature not valid in the specified time frame: Start [Fri, 10 Apr 2020 08:55:16 GMT] - Expiry [Sun, 11 Apr 2021 08:55:00 GMT] - Current [Fri, 01 Apr 2022 13:58:21 GMT]
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
RequestId:aa5e1056-501e-0056-63d0-457e49000000
Time:2022-04-01T13:58:21.2085578Z</Message><AuthenticationErrorDetail>Signature not valid in the specified time frame: Start [Fri, 10 Apr 2020 08:55:16 GMT] - Expiry [Sun, 11 Apr 2021 08:55:00 GMT] - Current [Fri, 01 Apr 2022 13:58:21 GMT]</AuthenticationErrorDetail></Error>

robinaly avatar Apr 01 '22 13:04 robinaly

Just want to chime in that I am also missing an example of how to use this in the readme. The only example there is about how to use it with Dash, but I, like Robin (and I assume, others in the future as well) want to instanciate a file system instead. If this is already in the works, great :-) Otherwise, I also don't mind writing a bit of docu myself as soon as I have figured out how to use it ;-)

kasuteru avatar Apr 08 '22 15:04 kasuteru

To add on my last comment: My current manual way to interact with Azure File System looks like this - I am using the azure-identity package.

from azure.identity import InteractiveBrowserCredential
from azure.storage.filedatalake import FileSystemClient

credential = InteractiveBrowserCredential()
credential.authenticate()

storage_name = "mystoragename"
container = "mycontainer"
url = f"https://{storage_name}.dfs.core.windows.net/"
fs_client = FileSystemClient(account_url= url,
                             credential=credential, 
                             file_system_name=container)
fs_client.exists() # Should return True if sucessful

kasuteru avatar Apr 13 '22 09:04 kasuteru

my two cents, I was able to use InteractiveBrowserCredential like this:

import pandas as pd
from azure.identity import InteractiveBrowserCredential

credentials  = InteractiveBrowserCredential(tenant_id={tenant_id})
credentials.authenticate()

storage_options = {'account_name' : {dalake_account_name}, 'anon': False}

df= pd.read_csv('az://{CONTAINER_NAME}/test/*.csv', storage_options=storage_options)
df.head()

The user needs to have the role of storage data blob contributor or storage blob data reader on the CONTAINER_NAME.

UPDATE: this is not working as expected it turns out that the credentials used were related az cli. When I ensure that there is no cached credential, I'm receiving the same issue reported on #312.

centraal-g avatar Aug 30 '22 15:08 centraal-g