filesystem_spec
filesystem_spec copied to clipboard
Unable to use "simplecache" to write NETCDF to Open Storage Network S3 storage
Hello, I'm running into a bug where I am unable to write NETCDF files to permissioned storage on Open Storage Network without having to explicitly create a local copy and then use put. The code is being run in parallel using Dask and Xarray on a kubernetes cluster.
I've used the code snippet below with simplecache to write to permissioned S3 storage before.
outfile = fsspec.open('simplecache::s3://file/path/to/OSN/ds.nc',
mode='wb', s3=dict(profile='profile'))
with outfile as f:
ds.load().to_netcdf(f, compute=True)
The modified code to include the OSN endpoint URL looks would then look like this:
fs_write = fsspec.filesystem('s3',
profile='profile',
skip_instance_cache=True,
client_kwargs={'endpoint_url': 'https://renc.osn.xsede.org'}
)
outfile = fs_write .open('simplecache::s3://file/path/to/OSN/ds.nc',
mode='wb', s3=dict(profile='profile'))
with outfile as f:
ds.load().to_netcdf(f, compute=True)
When I run the above code, I get a regex error about the simplecache not matching making S3 filepaths.
So, I have had to revert to creating a local file and then using put to add it to OSN
# xarray dataset with dask to NETCDF
ds.to_netcdf("ds.nc", compute=True, mode='w', engine='h5netcdf')
# use put to transfer file from local to OSN
_ = fs_write.put("ds.nc", "file/path/to/OSN/ds.nc")
A fix could be to add the simplecache functionality to the open method of fsspec.filesystem.
just a note that this writing-netcdf-with-simplecache problem is not related to open storage network -- the problem also occurs when trying to write to AWS S3 as well.
It's strange because the workaround is just what we thought simplecache was supposed to do: write locally then transfer!
Can we make a simpler example without xarray in the loop?
.# xarray dataset with dask to NETCDF
dask is involved here?? It is writing to a single file in multiple threads?