rioxarray
rioxarray copied to clipboard
TypeError: cannot pickle '_io.BufferedReader' object when trying to modify an xarray.DataArray opened with fsspec's filecache
👋🏽 Hoping someone on the team can help us figure out how to use fsspec filecache with netcdf data when we need to modify the xarray data array object with rioxarray. Right now, it is impossible to do so as we are getting the _io.BufferedReader
and the traceback led us to believe this has to do with the deep copy operation taking place https://github.com/corteva/rioxarray/blob/master/rioxarray/rioxarray.py#L1102 and https://github.com/corteva/rioxarray/blob/c15b86061feff8c2c7b0964f19922a3154a85f1a/rioxarray/rioxarray.py#L335
Code Sample
import fsspec
from morecantile import Tile
from rio_tiler.constants import WEB_MERCATOR_TMS
import numpy as np
import xarray as xr
import shutil
import pandas as pd
tms = WEB_MERCATOR_TMS
tile_bounds = tms.xy_bounds(Tile(x=0, y=0, z=0))
dst_crs = tms.rasterio_crs
protocol = 's3'
file_url = 's3://chunk-tests/3B42_Daily.19980101.7.nc4'
cache_storage_dir = 'fsspec-cache'
cache_options = ['filecache', 'blockcache']
inplace_options = [True, False]
# We can add `True` to this list, but `True` always returns AttributeError: __enter__
lock_options = [False]
xr_args = {
'engine': 'h5netcdf'
}
def rio_clip_box(da):
try:
crs = da.rio.crs or "epsg:4326"
da.rio.write_crs(crs, inplace=True)
# also with no data
da = da.rio.clip_box(*tile_bounds, crs=dst_crs)
except Exception as e:
return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
return '✅'
def rio_write_nodata(da, inplace: bool = True):
try:
da.rio.write_nodata(np.nan, inplace=inplace)
except Exception as e:
return f"❌ {type(e).__name__}: {e}".replace('\n', ' ')
return '✅'
columns = ('cache_option', 'inplace_option', 'lock_option', 'clip_box', 'write_nodata')
results = []
for cache_option in cache_options:
for inplace_option in inplace_options:
for lock_option in lock_options:
params = (cache_option, inplace_option, lock_option)
filecache_fs = fsspec.filesystem(cache_option, target_protocol=protocol, cache_storage=cache_storage_dir)
file_opener = filecache_fs.open(file_url, mode='rb')
xr_args['lock'] = lock_option
try:
ds = xr.open_dataset(file_opener, **xr_args)
except Exception as e:
results.append(params + (f"❌ {type(e).__name__}: {e}".replace('\n', ' '), f"❌ {type(e).__name__}: {e}".replace('\n', ' ')))
continue
da = ds['precipitation']
da = da.rename({'lon': 'x', 'lat': 'y'})
da = da.transpose("time", "y", "x", missing_dims="ignore")
rio_write_nodata_result = rio_write_nodata(da, inplace=inplace_option)
clip_box_result = rio_clip_box(da)
results.append(params + (clip_box_result, rio_write_nodata_result))
shutil.rmtree(cache_storage_dir)
df = pd.DataFrame(data=results, columns=columns)
df.to_markdown("results.md", index=False, tablefmt="github")
cache_option | inplace_option | lock_option | clip_box | write_nodata |
---|---|---|---|---|
filecache | True | False | ❌ TypeError: cannot pickle '_io.BufferedReader' object | ✅ |
filecache | False | False | ❌ TypeError: cannot pickle '_io.BufferedReader' object | ❌ TypeError: cannot pickle '_io.BufferedReader' object |
blockcache | True | False | ✅ | ✅ |
blockcache | False | False | ✅ | ✅ |
Problem description
It is not possible to make rioxarray operations on an xarray.DataArray that is stored in fsspec's filecache
Expected Output
Modified xarray.DataArray
Environment Information
python -c "import rioxarray; rioxarray.show_versions()"
returns
rioxarray (0.15.0) deps:
rasterio: 1.3.8
xarray: 2023.10.0
GDAL: 3.6.4
GEOS: 0.0.0
PROJ: 9.0.1
PROJ DATA: /Users/aimeebarciauskas/mambaforge/share/proj
GDAL DATA: /Users/aimeebarciauskas/mambaforge/share/gdal
Other python deps:
scipy: None
pyproj: 3.6.0
System:
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:39:40) [Clang 15.0.7 ]
executable: /Users/aimeebarciauskas/mambaforge/bin/python
machine: macOS-10.15.7-x86_64-i386-64bit
python -c "import fsspec; print(fsspec.__version__)"
returns
2023.9.0
Installation method
pip
It might be worth noting that if you don't remove the cache after each run of the 2 functions you get all instances of ❌ TypeError: cannot pickle '_io.BufferedReader' object
for clip_box
and for write_nodata
when inplace=False
. So rioxarray is not able to work with fsspec's blockcache for files either.
Related #614. Possible duplicate.