adlfs
adlfs copied to clipboard
Read fails right after write of pandas parquet file to Azure
When running the code below, with the conda env at the bottom (sorry I can not attach the YAML as a file!), it results in the exception described below. First read (right after write) fails after ~1 min with "HttpResponseError: Server encountered an internal error. Please try again after some time.", but following reads are successful. The reason is unknown, but a process sleep of 2 minutes before reading the file does not solve the problem. So I suspect it is due to unclosed file-pointers, which are first closed when the exception is raised!
What happened: I get the exception:
Traceback (most recent call last):
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 70, in _get_next_cb
return await self._command(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/storage/blob/_generated/aio/operations_async/_container_operations_async.py", line 1329, in list_blob_hierarchy_segment
raise models.StorageErrorException(response, self._deserialize)
azure.storage.blob._generated.models._models_py3.StorageErrorException: Operation returned an invalid status 'Server encountered an internal error. Please try again after some time.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 26, in <module>
df = pd.read_parquet(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet
return impl.read(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/pandas/io/parquet.py", line 312, in read
parquet_file = self.api.ParquetFile(path, **parquet_kwargs)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fastparquet/api.py", line 110, in __init__
with open_with(fn2, 'rb') as f:
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/pandas/io/parquet.py", line 303, in <lambda>
parquet_kwargs["open_with"] = lambda path, _: fsspec.open(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/core.py", line 134, in open
out = self.__enter__()
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/core.py", line 102, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/spec.py", line 930, in open
f = self._open(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/adlfs/spec.py", line 1424, in _open
return AzureBlobFile(
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/adlfs/spec.py", line 1528, in __init__
self.details = self.fs.info(self.path)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/adlfs/spec.py", line 524, in info
return maybe_sync(self._info, self, path)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
raise exc.with_traceback(tb)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
result[0] = await future
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/adlfs/spec.py", line 545, in _info
out = await self._ls(path, **kwargs)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/adlfs/spec.py", line 721, in _ls
async for next_blob in blobs:
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/core/async_paging.py", line 154, in __anext__
return await self.__anext__()
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/core/async_paging.py", line 157, in __anext__
self._page = await self._page_iterator.__anext__()
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/core/async_paging.py", line 99, in __anext__
self._response = await self._get_next(self.continuation_token)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 77, in _get_next_cb
process_storage_error(error)
File "/home/fogh/miniconda3/envs/py38_ADLS_POC/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 147, in process_storage_error
raise error
azure.core.exceptions.HttpResponseError: Server encountered an internal error. Please try again after some time.
RequestId:5b49de07-d01e-0129-05b8-4b8c02000000
Time:2021-05-18T07:36:15.2263076Z
ErrorCode:InternalError
Error:None
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f5825aee820>
What you expected to happen: The read executes without exception.
Minimal Complete Verifiable Example:
import os
from pathlib import Path
import adlfs
from dotenv import load_dotenv
import pandas as pd
load_dotenv()
storage_options = dict(
tenant_id=os.getenv('TENANT_ID'),
client_id=os.getenv('CLIENT_ID'),
client_secret=os.getenv('CLIENT_SECRET'))
abfs_slz = adlfs.AzureBlobFileSystem(
**storage_options, account_name='<account_name>')
adl_folder = Path('<ADL path>')
parquet_file = adl_folder/'test.parquet.brotli'
df = pd.DataFrame(data={'col1': [1, 2], 'col2': ['A', 'b']})
df.to_parquet(
f'az://{parquet_file}', storage_options=abfs_slz.storage_options,
compression='BROTLI')
df = pd.read_parquet(
f'az://{parquet_file}',
storage_options=abfs_slz.storage_options)
print(df)
Anything else we need to know?: Nope
Environment:
- Dask version: 2021.1.1=pyhd3eb1b0_0
- Python version: 1.2.1=py38ha9443f7_0
- Operating System: Ubuntu 20.04.2 LTS
- Install method (conda, pip, source): Conda
name: py38_ADLS_POC
channels:
- defaults
- conda-forge
- r
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_gnu
- adal=1.2.6=pyhd3eb1b0_0
- adlfs=0.6.0=pyhd8ed1ab_0
- affine=2.3.0=py_0
- aiohttp=3.6.3=py38h7b6447c_0
- argon2-cffi=20.1.0=py38h7b6447c_1
- asciitree=0.3.3=py_2
- async-timeout=3.0.1=py38h06a4308_0
- async_generator=1.10=pyhd3eb1b0_0
- attrs=20.3.0=pyhd3eb1b0_0
- azure-core=1.10.0=pyhd8ed1ab_0
- azure-datalake-store=0.0.51=pyh9f0ad1d_0
- azure-identity=1.5.0=pyhd8ed1ab_0
- azure-storage-blob=12.6.0=pyhd3deb0d_0
- backcall=0.2.0=pyhd3eb1b0_0
- blas=1.0=mkl
- bleach=3.3.0=pyhd3eb1b0_0
- blinker=1.4=py38h06a4308_0
- bokeh=2.2.3=py38_0
- boost-cpp=1.74.0=h9359b55_0
- brotlipy=0.7.0=py38h27cfd23_1003
- bzip2=1.0.8=h7b6447c_0
- c-ares=1.17.1=h27cfd23_0
- ca-certificates=2021.1.19=h06a4308_0
- cairo=1.16.0=h9f066cc_1006
- certifi=2020.12.5=py38h06a4308_0
- cffi=1.14.4=py38h261ae71_0
- cfitsio=3.470=hf0d0db6_6
- chardet=3.0.4=py38h06a4308_1003
- click=7.1.2=pyhd3eb1b0_0
- click-plugins=1.1.1=py_0
- cligj=0.7.1=py38h06a4308_0
- cloudpickle=1.6.0=py_0
- cryptography=3.3.1=py38h3c74f83_0
- curl=7.71.1=he644dc0_8
- cycler=0.10.0=py38_0
- cytoolz=0.11.0=py38h7b6447c_0
- dask=2021.1.1=pyhd3eb1b0_0
- dask-core=2021.1.1=pyhd3eb1b0_0
- dbus=1.13.18=hb2f20db_0
- decorator=4.4.2=pyhd3eb1b0_0
- defusedxml=0.6.0=py_0
- distributed=2021.1.1=py38h06a4308_1
- entrypoints=0.3=py38_0
- expat=2.2.10=he6710b0_2
- fasteners=0.16=pyhd3eb1b0_0
- fastparquet=0.5.0=py38h6323ea4_1
- fontconfig=2.13.1=hba837de_1004
- freetype=2.10.4=h5ab3b9f_0
- freexl=1.0.6=h27cfd23_0
- fsspec=0.8.5=pyhd8ed1ab_0
- geos=3.8.1=he6710b0_0
- geotiff=1.6.0=h5d11630_3
- gettext=0.19.8.1=h9b4dc7a_1
- giflib=5.2.1=h7b6447c_0
- glib=2.66.4=hc4f0c31_2
- glib-tools=2.66.4=hc4f0c31_2
- gst-plugins-base=1.14.5=h0935bb2_2
- gstreamer=1.18.3=h3560a44_0
- hdf4=4.2.13=h3ca952b_2
- hdf5=1.10.6=nompi_h3c11f04_101
- heapdict=1.0.1=py_0
- icu=67.1=he1b5a44_0
- idna=2.10=pyhd3eb1b0_0
- imageio=2.9.0=py_0
- importlib-metadata=2.0.0=py_1
- importlib_metadata=2.0.0=1
- intel-openmp=2020.2=254
- ipykernel=5.3.4=py38h5ca1d4c_0
- ipython=7.20.0=py38hb070fc8_1
- ipython_genutils=0.2.0=pyhd3eb1b0_1
- isodate=0.6.0=py_1
- jedi=0.17.2=py38h06a4308_1
- jinja2=2.11.3=pyhd3eb1b0_0
- jpeg=9d=h36c2ea0_0
- json-c=0.13.1=h1bed415_0
- jsonschema=3.2.0=py_2
- jupyter_client=6.1.7=py_0
- jupyter_core=4.7.1=py38h06a4308_0
- jupyterlab_pygments=0.1.2=py_0
- kealib=1.4.14=h0042707_0
- kiwisolver=1.3.1=py38h2531618_0
- krb5=1.17.1=h173b8e3_0
- lcms2=2.11=h396b838_0
- ld_impl_linux-64=2.33.1=h53a641e_7
- libclang=11.0.1=default_ha53f305_1
- libcurl=7.71.1=hcdd3856_8
- libdap4=3.20.6=h1d1bd15_1
- libedit=3.1.20191231=h14c3975_1
- libev=4.33=h7b6447c_0
- libevent=2.1.10=hcdb4288_3
- libffi=3.3=he6710b0_2
- libgcc-ng=9.3.0=h2828fa1_18
- libgdal=3.1.4=h670eac6_0
- libgfortran-ng=7.3.0=hdf63c60_0
- libglib=2.66.4=h748fe8e_2
- libgomp=9.3.0=h2828fa1_18
- libiconv=1.16=h516909a_0
- libkml=1.3.0=h74f7ee3_1012
- libllvm10=10.0.1=hbcb73fb_5
- libllvm11=11.0.1=hf817b99_0
- libnetcdf=4.7.4=nompi_h56d31a8_107
- libnghttp2=1.41.0=hf8bcb03_2
- libpng=1.6.37=hbc83047_0
- libpq=12.3=h255efa7_3
- libsodium=1.0.18=h7b6447c_0
- libspatialite=5.0.0=heaf302f_0
- libssh2=1.9.0=h1ba5d50_1
- libstdcxx-ng=9.3.0=h6de172a_18
- libtiff=4.1.0=h2733197_1
- libuuid=2.32.1=h7f98852_1000
- libwebp-base=1.2.0=h27cfd23_0
- libxcb=1.14=h7b6447c_0
- libxkbcommon=1.0.3=he3ba5ed_0
- libxml2=2.9.10=h68273f3_2
- llvmlite=0.34.0=py38h269e1b5_4
- locket=0.2.1=py38h06a4308_1
- lz4-c=1.9.2=heb0550a_3
- markupsafe=1.1.1=py38h7b6447c_0
- matplotlib=3.3.2=h06a4308_0
- matplotlib-base=3.3.2=py38h817c723_0
- mistune=0.8.4=py38h7b6447c_1000
- mkl=2020.2=256
- mkl-service=2.3.0=py38he904b0f_0
- mkl_fft=1.2.0=py38h23d657b_0
- mkl_random=1.1.1=py38h0573a6f_0
- monotonic=1.5=py_0
- msal=1.8.0=pyhd3deb0d_0
- msal_extensions=0.3.0=pyh9f0ad1d_0
- msgpack-python=1.0.2=py38hff7bd54_1
- msrest=0.6.21=pyh44b312d_0
- msrestazure=0.6.4=pyhd8ed1ab_0
- multidict=4.7.6=py38h7b6447c_1
- mysql-common=8.0.22=ha770c72_1
- mysql-libs=8.0.22=h1fd7589_1
- nb_conda_kernels=2.3.1=py38h06a4308_0
- nbclient=0.5.1=py_0
- nbconvert=6.0.7=py38_0
- nbformat=5.1.2=pyhd3eb1b0_1
- ncurses=6.2=he6710b0_1
- nest-asyncio=1.4.3=pyhd3eb1b0_0
- networkx=2.5=py_0
- notebook=6.2.0=py38h06a4308_0
- nspr=4.29=h9c3ff4c_1
- nss=3.61=hb5efdd6_0
- numba=0.51.2=py38h0573a6f_1
- numcodecs=0.7.3=py38h2531618_0
- numpy=1.19.2=py38h54aff64_0
- numpy-base=1.19.2=py38hfa32c7d_0
- oauthlib=3.1.0=py_0
- olefile=0.46=py_0
- openjpeg=2.3.1=hf7af979_3
- openssl=1.1.1i=h27cfd23_0
- packaging=20.9=pyhd3eb1b0_0
- pandas=1.2.1=py38ha9443f7_0
- pandoc=2.11=hb0f4dca_0
- pandocfilters=1.4.3=py38h06a4308_1
- parso=0.7.0=py_0
- partd=1.1.0=py_0
- pcre=8.44=he6710b0_0
- pexpect=4.8.0=pyhd3eb1b0_3
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pillow=8.1.0=py38he98fc37_0
- pip=20.3.3=py38h06a4308_0
- pixman=0.40.0=h7b6447c_0
- poppler=0.89.0=h669c267_1
- poppler-data=0.4.10=h06a4308_0
- portalocker=2.2.0=py38h06a4308_0
- postgresql=12.3=hc2f5b80_3
- proj=7.1.1=h966b41f_3
- prometheus_client=0.9.0=pyhd3eb1b0_0
- prompt-toolkit=3.0.8=py_0
- psutil=5.8.0=py38h27cfd23_1
- ptyprocess=0.7.0=pyhd3eb1b0_2
- pycparser=2.20=py_2
- pygments=2.7.4=pyhd3eb1b0_0
- pyjwt=1.7.1=py38_0
- pyopenssl=20.0.1=pyhd3eb1b0_1
- pyparsing=2.4.7=pyhd3eb1b0_0
- pyproj=2.6.1.post1=py38h56787f0_3
- pyqt=5.12.3=py38h578d9bd_7
- pyqt-impl=5.12.3=py38h7400c14_7
- pyqt5-sip=4.19.18=py38h709712a_7
- pyqtchart=5.12=py38h7400c14_7
- pyqtwebengine=5.12.1=py38h7400c14_7
- pyrsistent=0.17.3=py38h7b6447c_0
- pysocks=1.7.1=py38h06a4308_0
- python=3.8.5=h7579374_1
- python-dateutil=2.8.1=pyhd3eb1b0_0
- python-dotenv=0.15.0=pyhd8ed1ab_0
- python_abi=3.8=1_cp38
- pytz=2021.1=pyhd3eb1b0_0
- pywavelets=1.1.1=py38h7b6447c_2
- pyyaml=5.4.1=py38h27cfd23_1
- pyzmq=20.0.0=py38h2531618_1
- qt=5.12.9=h763d07f_1
- rasterio=1.2.0=py38h033aa8a_0
- readline=8.1=h27cfd23_0
- requests=2.25.1=pyhd3eb1b0_0
- requests-oauthlib=1.3.0=py_0
- rioxarray=0.2.0=pyhd8ed1ab_0
- scikit-image=0.17.2=py38hdf5156a_0
- scipy=1.5.2=py38h0b6359f_0
- send2trash=1.5.0=pyhd3eb1b0_1
- setuptools=52.0.0=py38h06a4308_0
- shapely=1.7.1=py38ha11d057_1
- six=1.15.0=py38h06a4308_0
- snuggs=1.4.7=py_0
- sortedcontainers=2.3.0=pyhd3eb1b0_0
- sqlite=3.34.0=h74cdb3f_0
- tbb=2020.3=hfd86e86_0
- tblib=1.7.0=py_0
- terminado=0.9.2=py38h06a4308_0
- testpath=0.4.4=pyhd3eb1b0_0
- thrift=0.11.0=py38he6710b0_0
- tifffile=2020.10.1=py38hdd07704_2
- tiledb=2.1.5=h17508cd_0
- tk=8.6.10=hbc83047_0
- toolz=0.11.1=pyhd3eb1b0_0
- tornado=6.1=py38h27cfd23_0
- traitlets=5.0.5=pyhd3eb1b0_0
- typing_extensions=3.7.4.3=pyh06a4308_0
- tzcode=2021a=h7f98852_0
- urllib3=1.26.3=pyhd3eb1b0_0
- wcwidth=0.2.5=py_0
- webencodings=0.5.1=py38_1
- wheel=0.36.2=pyhd3eb1b0_0
- xarray=0.16.2=pyhd3eb1b0_0
- xerces-c=3.2.3=hfe33f54_1
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.0.10=h516909a_0
- xorg-libsm=1.2.3=h84519dc_1000
- xorg-libx11=1.6.12=h516909a_0
- xorg-libxext=1.3.4=h516909a_0
- xorg-libxrender=0.9.10=h516909a_1002
- xorg-renderproto=0.11.1=h14c3975_1002
- xorg-xextproto=7.3.0=h7f98852_1002
- xorg-xproto=7.0.31=h27cfd23_1007
- xz=5.2.5=h7b6447c_0
- yaml=0.2.5=h7b6447c_0
- yarl=1.6.3=py38h27cfd23_0
- zarr=2.6.1=pyhd3eb1b0_0
- zeromq=4.3.3=he6710b0_3
- zict=2.0.0=py_0
- zipp=3.4.0=pyhd3eb1b0_0
- zlib=1.2.11=h7b6447c_3
- zstd=1.4.5=h9ceee32_0
- pip:
- osgeo==0.0.0
- pygdal==3.1.4.6
prefix: /home/donj/miniconda3/envs/py38_ADLS_POC