azure-sdk-for-python
azure-sdk-for-python copied to clipboard
azure-storage-file-datalake raises error - cannot import name 'case_insensitive_dict'
- Package Name: azure.storage.filedatalake
- Package Version: 12.8.0
- Operating System: Azure Synapse / PySpark (Linux/4.15.0-1146-azure ubuntu/18.04.6 glibc/2.27)
- Python Version: 3.8
Describe the bug
Can install this package into PySpark via %pip install azure-storage-file-datalake
However it does not work when used. The issue may be a problem with the python package dependency setup?
To Reproduce
- Open Azure Synapse,
- Create a new PySpark notebook
- Add the following 3 cells, and run:
%pip install azure-storage-file-datalake
Requirement already satisfied: azure-storage-file-datalake in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (12.8.0)
Requirement already satisfied: azure-core<2.0.0,>=1.23.1 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-file-datalake) (1.25.0)
Requirement already satisfied: msrest>=0.6.21 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-file-datalake) (0.6.21)
Requirement already satisfied: azure-storage-blob<13.0.0,>=12.13.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-file-datalake) (12.13.1)
Requirement already satisfied: requests>=2.18.4 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (2.25.1)
Requirement already satisfied: six>=1.11.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (1.16.0)
Requirement already satisfied: typing-extensions>=4.0.1 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (4.3.0)
Requirement already satisfied: cryptography>=2.1.4 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (3.4.7)
Requirement already satisfied: cffi>=1.12 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (1.14.5)
Requirement already satisfied: pycparser in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from cffi>=1.12->cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (2.20)
Requirement already satisfied: requests-oauthlib>=0.5.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (1.3.0)
Requirement already satisfied: isodate>=0.6.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (0.6.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (1.26.4)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (4.0.0)
Requirement already satisfied: oauthlib>=3.0.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-storage-file-datalake) (3.1.1)
Note: you may need to restart the kernel to use updated packages.
STORAGE_ACCOUNT = "..."
KEYVAULT_LINKED_SERVICE_NAME = "..."
KEYVAULT_NAME = "..."
KEYVAULT_SECRET_NAME = "..."
from azure.storage.filedatalake import DataLakeServiceClient
#Error is raised on this next line:
fs = DataLakeServiceClient(
account_url=f"https://{STORAGE_ACCOUNT}.dfs.core.windows.net/",
file_system_name=FILE_SYSTEM_NAME ,
credential={
"account_name": STORAGE_ACCOUNT,
"account_key": mssparkutils.credentials.getSecret(KEYVAULT_NAME, KEYVAULT_SECRET_NAME, KEYVAULT_LINKED_SERVICE_NAME)
}
)
# list files
print([*fs.list_file_systems()])
Error Message
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
~/cluster-env/env/lib/python3.8/site-packages/azure/core/rest/__init__.py in <module>
26 try:
---> 27 from ._rest_py3 import (
28 HttpRequest,
~/cluster-env/env/lib/python3.8/site-packages/azure/core/rest/_rest_py3.py in <module>
37
---> 38 from ..utils._utils import case_insensitive_dict
39
ImportError: cannot import name 'case_insensitive_dict' from 'azure.core.utils._utils' (/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/core/utils/_utils.py)
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
<ipython-input-56-2fbb0380> in <module>
----> 1 from azure.storage.filedatalake import DataLakeServiceClient
2 fs = DataLakeServiceClient(
3 account_url=f"https://{STORAGE_ACCOUNT}.dfs.core.windows.net/",
4 credential={
5 "account_name": STORAGE_ACCOUNT,
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/__init__.py in <module>
6
7 from ._download import StorageStreamDownloader
----> 8 from ._data_lake_file_client import DataLakeFileClient
9 from ._data_lake_directory_client import DataLakeDirectoryClient
10 from ._file_system_client import FileSystemClient
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_data_lake_file_client.py in <module>
24 from ._upload_helper import upload_datalake_file
25 from ._download import StorageStreamDownloader
---> 26 from ._path_client import PathClient
27 from ._serialize import get_mod_conditions, get_path_http_headers, get_access_conditions, add_metadata_headers, \
28 convert_datetime_to_rfc1123, get_cpk_info
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_path_client.py in <module>
21 from ._data_lake_lease import DataLakeLeaseClient
22 from ._deserialize import process_storage_error
---> 23 from ._generated import AzureDataLakeStorageRESTAPI
24 from ._models import LocationMode, DirectoryProperties, AccessControlChangeResult, AccessControlChanges, \
25 AccessControlChangeCounters, AccessControlChangeFailure
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_generated/__init__.py in <module>
7 # --------------------------------------------------------------------------
8
----> 9 from ._azure_data_lake_storage_restapi import AzureDataLakeStorageRESTAPI
10
11 try:
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_generated/_azure_data_lake_storage_restapi.py in <module>
16 from . import models
17 from ._configuration import AzureDataLakeStorageRESTAPIConfiguration
---> 18 from .operations import FileSystemOperations, PathOperations, ServiceOperations
19
20 if TYPE_CHECKING:
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_generated/operations/__init__.py in <module>
7 # --------------------------------------------------------------------------
8
----> 9 from ._service_operations import ServiceOperations
10 from ._file_system_operations import FileSystemOperations
11 from ._path_operations import PathOperations
~/cluster-env/env/lib/python3.8/site-packages/azure/storage/filedatalake/_generated/operations/_service_operations.py in <module>
15 from azure.core.pipeline import PipelineResponse
16 from azure.core.pipeline.transport import HttpResponse
---> 17 from azure.core.rest import HttpRequest
18 from azure.core.tracing.decorator import distributed_trace
19 from azure.core.utils import case_insensitive_dict
~/cluster-env/env/lib/python3.8/site-packages/azure/core/rest/__init__.py in <module>
30 )
31 except (SyntaxError, ImportError):
---> 32 from ._rest import ( # type: ignore
33 HttpRequest,
34 HttpResponse,
~/cluster-env/env/lib/python3.8/site-packages/azure/core/rest/_rest.py in <module>
29 from typing import TYPE_CHECKING, MutableMapping
30
---> 31 from ..utils._utils import case_insensitive_dict
32 from ._helpers import (
33 set_content_body,
ImportError: cannot import name 'case_insensitive_dict' from 'azure.core.utils._utils' (/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/core/utils/_utils.py)
Expected behavior
The DataLakeServiceClient object is created without error, a list of file systems is printed
Additional context
Let me know if I need to raise this issue elsewhere (ie does Synapse / Spark have a more appropriate issues list?)
Thanks for reaching out.
Synapse / Spark may pin some package versions.
Is there a way for you to get the list of what exact versions are in use? (not sure if pip list works or not)
Thanks for reaching out.
Synapse / Spark may pin some package versions.
Is there a way for you to get the list of what exact versions are in use? (not sure if pip list works or not)
Hi @xiangyan99
I am not sure how to determine if a package has been frozen... I'll try figure that out.
I re-did the pip install and did a pip list. Output below;
Many thanks
Collecting azure-storage-file-datalake
Downloading azure_storage_file_datalake-12.8.0-py3-none-any.whl (225 kB)
|████████████████████████████████| 225 kB 16.5 MB/s eta 0:00:01
Requirement already satisfied: msrest>=0.6.21 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-file-datalake) (0.6.21)
Collecting azure-storage-blob<13.0.0,>=12.13.0
Downloading azure_storage_blob-12.13.1-py3-none-any.whl (377 kB)
|████████████████████████████████| 377 kB 68.8 MB/s eta 0:00:01
Collecting azure-core<2.0.0,>=1.23.1
Downloading azure_core-1.25.0-py3-none-any.whl (178 kB)
|████████████████████████████████| 178 kB 63.6 MB/s eta 0:00:01
Collecting typing-extensions>=4.0.1
Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: requests>=2.18.4 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (2.25.1)
Requirement already satisfied: six>=1.11.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (1.16.0)
Requirement already satisfied: cryptography>=2.1.4 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (3.4.7)
Requirement already satisfied: cffi>=1.12 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (1.14.5)
Requirement already satisfied: pycparser in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from cffi>=1.12->cryptography>=2.1.4->azure-storage-blob<13.0.0,>=12.13.0->azure-storage-file-datalake) (2.20)
Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (2021.5.30)
Requirement already satisfied: requests-oauthlib>=0.5.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (1.3.0)
Requirement already satisfied: isodate>=0.6.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from msrest>=0.6.21->azure-storage-file-datalake) (0.6.0)
Requirement already satisfied: idna<3,>=2.5 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.23.1->azure-storage-file-datalake) (1.26.4)
Requirement already satisfied: oauthlib>=3.0.0 in /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.21->azure-storage-file-datalake) (3.1.1)
Installing collected packages: typing-extensions, azure-core, azure-storage-blob, azure-storage-file-datalake
Attempting uninstall: typing-extensions
Found existing installation: typing-extensions 3.10.0.0
Uninstalling typing-extensions-3.10.0.0:
Successfully uninstalled typing-extensions-3.10.0.0
Attempting uninstall: azure-core
Found existing installation: azure-core 1.22.1
Uninstalling azure-core-1.22.1:
Successfully uninstalled azure-core-1.22.1
Attempting uninstall: azure-storage-blob
Found existing installation: azure-storage-blob 12.8.1
Uninstalling azure-storage-blob-12.8.1:
Successfully uninstalled azure-storage-blob-12.8.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.1 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.4.1 requires typing-extensions~=3.7.4, but you have typing-extensions 4.3.0 which is incompatible.
Successfully installed azure-core-1.25.0 azure-storage-blob-12.13.1 azure-storage-file-datalake-12.8.0 typing-extensions-4.3.0
Note: you may need to restart the kernel to use updated packages.
Package Version
----------------------------- -------------------
absl-py 0.13.0
adal 1.2.7
adlfs 0.7.7
aiohttp 3.7.4.post0
appdirs 1.4.4
applicationinsights 0.11.10
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
astor 0.8.1
astunparse 1.6.3
async-timeout 3.0.1
attrs 21.2.0
azure-common 1.1.27
azure-core 1.25.0
azure-datalake-store 0.0.51
azure-graphrbac 0.61.1
azure-identity 1.5.0
azure-mgmt-authorization 0.61.0
azure-mgmt-containerregistry 8.0.0
azure-mgmt-core 1.3.0
azure-mgmt-keyvault 2.2.0
azure-mgmt-resource 13.0.0
azure-mgmt-storage 11.2.0
azure-storage-blob 12.13.1
azure-storage-file-datalake 12.8.0
azure-synapse-ml-predict 1.0.0
azureml-core 1.34.0
azureml-dataprep 2.22.2
azureml-dataprep-native 38.0.0
azureml-dataprep-rslex 1.20.2
azureml-dataset-runtime 1.34.0
azureml-mlflow 1.34.0
azureml-opendatasets 1.34.0
azureml-synapse 0.0.1
azureml-telemetry 1.34.0
backcall 0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile 1.0
backports.weakref 1.0.post1
beautifulsoup4 4.9.3
bleach 5.0.1
blinker 1.4
bokeh 2.3.2
Brotli 1.0.9
brotlipy 0.7.0
cachetools 4.2.2
certifi 2021.5.30
cffi 1.14.5
chardet 4.0.0
click 8.0.1
cloudpickle 1.6.0
conda-package-handling 1.7.3
configparser 5.0.2
contextlib2 0.6.0.post1
cryptography 3.4.7
cycler 0.10.0
Cython 0.29.23
cytoolz 0.11.0
dash 1.20.0
dash-core-components 1.16.0
dash-cytoscape 0.2.0
dash-html-components 1.1.3
dash-renderer 1.9.1
dash-table 4.11.3
dask 2021.6.2
databricks-cli 0.12.1
debugpy 1.3.0
decorator 4.4.2
defusedxml 0.7.1
dill 0.3.4
distlib 0.3.5
distro 1.7.0
docker 4.4.4
dotnetcore2 2.1.23
entrypoints 0.3
et-xmlfile 1.1.0
fastjsonschema 2.16.1
filelock 3.7.1
fire 0.4.0
Flask 2.0.1
Flask-Compress 0.0.0
flatbuffers 1.12
fsspec 2021.6.1
fsspec-wrapper 0.1.5
fusepy 3.0.1
future 0.18.2
gast 0.3.3
gensim 3.8.3
geographiclib 1.52
geopy 2.1.0
gevent 21.1.2
gitdb 4.0.7
GitPython 3.1.18
google-auth 1.32.1
google-auth-oauthlib 0.4.1
google-pasta 0.2.0
greenlet 1.1.0
grpcio 1.37.1
h5py 2.10.0
html5lib 1.1
hummingbird-ml 0.4.0
idna 2.10
imagecodecs 2021.3.31
imageio 2.9.0
importlib-metadata 4.6.1
importlib-resources 5.9.0
ipykernel 6.0.1
ipython 7.23.1
ipython-genutils 0.2.0
ipywidgets 7.6.3
isodate 0.6.0
itsdangerous 2.0.1
jdcal 1.4.1
jedi 0.18.0
jeepney 0.6.0
Jinja2 3.0.1
jmespath 0.10.0
joblib 1.0.1
jsonpickle 2.0.0
jsonschema 4.9.1
jupyter-client 6.1.12
jupyter-core 4.7.1
jupyterlab-pygments 0.2.2
jupyterlab-widgets 1.1.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
keras2onnx 1.6.5
kiwisolver 1.3.1
koalas 1.8.0
KqlmagicCustom 0.1.114.post8
liac-arff 2.5.0
library-metadata-cooker 0.0.7
lightgbm 3.2.1
lime 0.2.0.1
llvmlite 0.36.0
locket 0.2.1
lxml 4.6.5
Markdown 3.3.4
MarkupSafe 2.0.1
matplotlib 3.4.2
matplotlib-inline 0.1.2
mistune 0.8.4
mleap 0.17.0
mlflow-skinny 1.18.0
msal 1.12.0
msal-extensions 0.3.0
msrest 0.6.21
msrestazure 0.6.4
multidict 5.1.0
mypy 0.780
mypy-extensions 0.4.3
nbclient 0.6.6
nbconvert 6.5.0
nbformat 5.4.0
ndg-httpsclient 0.5.1
nest-asyncio 1.5.5
networkx 2.5.1
nltk 3.6.2
notebook 6.4.12
notebookutils 3.2.0-20220727.3
numba 0.53.1
numpy 1.19.4
oauthlib 3.1.1
olefile 0.46
onnx 1.9.0
onnxconverter-common 1.7.0
onnxmltools 1.7.0
onnxruntime 1.7.2
openpyxl 3.0.7
opt-einsum 3.3.0
packaging 21.0
pandas 1.2.3
pandasql 0.7.3
pandocfilters 1.5.0
parso 0.8.2
partd 1.2.0
pathspec 0.8.1
patsy 0.5.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.2.0
pip 21.1.1
pkgutil-resolve-name 1.3.10
platformdirs 2.5.2
plotly 4.14.3
pmdarima 1.8.2
pooch 1.4.0
portalocker 1.7.1
prettytable 2.4.0
prometheus-client 0.14.1
prompt-toolkit 3.0.19
protobuf 3.15.8
psutil 5.8.0
ptyprocess 0.7.0
py4j 0.10.9.3
pyarrow 3.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycairo 1.20.1
pycosat 0.6.3
pycparser 2.20
Pygments 2.9.0
PyGObject 3.40.1
PyJWT 2.1.0
pyodbc 4.0.30
pyOpenSSL 20.0.1
pyparsing 2.4.7
pyperclip 1.8.2
PyQt5 5.12.3
PyQt5-sip 4.19.18
PyQtChart 5.12
PyQtWebEngine 5.12.1
pyrsistent 0.18.1
PySocks 1.7.1
pyspark 3.2.1
python-dateutil 2.8.1
pytz 2021.1
pyu2f 0.1.5
PyWavelets 1.1.1
PyYAML 5.4.1
pyzmq 22.1.0
regex 2021.7.6
requests 2.25.1
requests-oauthlib 1.3.0
retrying 1.3.3
rsa 4.7.2
ruamel-yaml-conda 0.15.100
ruamel.yaml 0.17.4
ruamel.yaml.clib 0.2.6
SALib 1.3.11
scikit-image 0.18.1
scikit-learn 0.23.2
scipy 1.5.3
seaborn 0.11.1
SecretStorage 3.3.1
Send2Trash 1.8.0
setuptools 49.6.0.post20210108
shap 0.39.0
six 1.16.0
skl2onnx 1.8.0
sklearn-pandas 2.2.0
slicer 0.0.7
smart-open 5.1.0
smmap 3.0.5
soupsieve 2.2.1
SQLAlchemy 1.4.20
sqlanalyticsconnectorpy 1.0.0
statsmodels 0.12.2
synapseml-cognitive 0.10.0
synapseml-core 0.10.0
synapseml-deep-learning 0.10.0
synapseml-lightgbm 0.10.0
synapseml-opencv 0.10.0
synapseml-vw 0.10.0
tabulate 0.8.9
tenacity 7.0.0
tensorboard 2.4.1
tensorboard-plugin-wit 1.8.0
tensorflow 2.4.1
tensorflow-estimator 2.4.0
termcolor 1.1.0
terminado 0.15.0
textblob 0.15.3
threadpoolctl 2.1.0
tifffile 2021.4.8
tinycss2 1.1.1
toolz 0.11.1
torch 1.8.1
torchvision 0.9.1
tornado 6.1
tqdm 4.61.2
traitlets 5.3.0
typed-ast 1.4.3
typing-extensions 4.3.0
urllib3 1.26.4
virtualenv 20.14.0
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 1.1.0
Werkzeug 2.0.1
wheel 0.36.2
widgetsnbextension 3.5.2
wrapt 1.12.1
xgboost 1.4.0
yarl 1.6.3
zipp 3.5.0
zope.event 4.5.0
zope.interface 5.4.0
Note: you may need to restart the kernel to use updated packages.
Maybe we can run
import azure.core
print(azure.core.VERSION)
to see what exact core version we are using.
Well this is embarrassing :(
It seems the problem was that in my testing I was running the %pip install azure-storage-file-datalake after importing other azure.~ packages or without properly starting a new PySpark session. pip install azure-storage-file-datalake must be the first thing that happens. Otherwise any library that depends on azure.core pollutes the sys.modules with an old version of azure.core
%pip install azure-storage-file-datalake
import azure.core
print(azure.core.VERSION)
1.25.0
I tested my original code again and it works now.
Really sorry about that, thankyou for helping to understand the problem.
Kind regards
Thank you for the updates.