Attributes encoding compatibility between backends
What happened:
Let's create an Zarr dataset with some "less common" dtype and fill value, open it with Xarray and save the dataset as NetCDF:
import xarray as xr
import zarr
g = zarr.group()
g.create('arr', shape=3, fill_value='z', dtype='<U1')
g['arr'].attrs['_ARRAY_DIMENSIONS'] = ('dim_1')
# -- without masking fill values
ds = xr.open_zarr(g.store, mask_and_scale=False)
ds.arr.attrs # returns {'_FillValue': 'z'}
# error: netCDF4 does not yet support setting a fill value for variable-length strings
ds.to_netcdf('test.nc')
# -- with masking fill values
ds2 = xr.open_zarr(g.store, mask_and_scale=True)
# returns a dict that includes item _FillValue': 'z'
ds2.arr.encoding
# same error than above
ds2.to_netcdf('out2.nc')
What you expected to happen:
Seamless conversion (read/write) from one backend to another. Is there anything we could do to improve the case shown here above, and maybe other cases like the one described in #5223?
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None libhdf5: None libnetcdf: None
xarray: 0.17.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.11.0 distributed: 2.14.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 19.2.3 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None
The issue above is actually duplicate of #1647, but I wonder if there could be a more generic fix for dtypes other than variable length strings?