filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Unable to use "simplecache" to write NETCDF to Open Storage Network S3 storage

Open alaws-USGS opened this issue 2 years ago • 2 comments

Hello, I'm running into a bug where I am unable to write NETCDF files to permissioned storage on Open Storage Network without having to explicitly create a local copy and then use put. The code is being run in parallel using Dask and Xarray on a kubernetes cluster.

I've used the code snippet below with simplecache to write to permissioned S3 storage before.

outfile = fsspec.open('simplecache::s3://file/path/to/OSN/ds.nc', 
                      mode='wb', s3=dict(profile='profile'))
with outfile as f:
    ds.load().to_netcdf(f, compute=True)

The modified code to include the OSN endpoint URL looks would then look like this:

fs_write = fsspec.filesystem('s3', 
        profile='profile', 
        skip_instance_cache=True, 
        client_kwargs={'endpoint_url': 'https://renc.osn.xsede.org'}
        )

outfile = fs_write .open('simplecache::s3://file/path/to/OSN/ds.nc', 
                      mode='wb', s3=dict(profile='profile'))
with outfile as f:
    ds.load().to_netcdf(f, compute=True)

When I run the above code, I get a regex error about the simplecache not matching making S3 filepaths.

So, I have had to revert to creating a local file and then using put to add it to OSN

# xarray dataset with dask to NETCDF
ds.to_netcdf("ds.nc", compute=True, mode='w', engine='h5netcdf')

# use put to transfer file from local to OSN
_ = fs_write.put("ds.nc", "file/path/to/OSN/ds.nc")

A fix could be to add the simplecache functionality to the open method of fsspec.filesystem.

alaws-USGS avatar Dec 05 '22 21:12 alaws-USGS

just a note that this writing-netcdf-with-simplecache problem is not related to open storage network -- the problem also occurs when trying to write to AWS S3 as well.

It's strange because the workaround is just what we thought simplecache was supposed to do: write locally then transfer!

rsignell-usgs avatar Dec 06 '22 11:12 rsignell-usgs

Can we make a simpler example without xarray in the loop?

.# xarray dataset with dask to NETCDF

dask is involved here?? It is writing to a single file in multiple threads?

martindurant avatar Dec 06 '22 17:12 martindurant