filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

'pyarrow._hdfs.HadoopFileSystem' object has no attribute 'host'

Open marberi opened this issue 7 months ago • 3 comments

I tried connecting to a HDFS storage, through the default configuation (core-site.xml). Connecting, plus writing and reading a dataframe worked find (not shown). However, when attempting to run the code:

""" import dask.array as da

N = 10_000 rng = da.random.default_rng() x = rng.random((N, N), chunks=(2000, 2000)) x.to_zarr("hdfs:///user/eriksen/test2.zarr") """

It ends up failing with the following issue, which seems to be fsspec related:

File /data/aai/scratch_ssd/eriksen/miniforge3/envs/dask/lib/python3.13/functools.py:1026, in cached_property.get(self, instance, owner) 1024 val = cache.get(self.attrname, _NOT_FOUND) 1025 if val is _NOT_FOUND: -> 1026 val = self.func(instance) 1027 try: 1028 cache[self.attrname] = val

File /data/aai/scratch_ssd/eriksen/miniforge3/envs/dask/lib/python3.13/site-packages/fsspec/implementations/arrow.py:63, in ArrowFSWrapper.fsid(self) 61 @cached_property 62 def fsid(self): ---> 63 return "hdfs_" + tokenize(self.fs.host, self.fs.port)

AttributeError: 'pyarrow._hdfs.HadoopFileSystem' object has no attribute 'host'

I installed the environment today with Python 3.13 and the following packages: cloudpickle==3.1.1 dask==2025.5.1 distributed==2025.5.1 fsspec==2025.5.1 pyarrow==20.0.0 toolz==1.0.0 zarr==3.0.8 zict==3.0.0

Please let me know if you need any other information or I should be reporting this issue elsewhere.

marberi avatar Jun 19 '25 23:06 marberi