Ryan Abernathey

Results 1182 comments of Ryan Abernathey

So it sounds like we have found an s3fs bug? pinging @martindurant, our fsspec superhero 🦸

Just to help get to the bottom of it, could you try the following. ```python ds1 = ds.open_dataset(opendap_url, decode_coords=False).load() ds2 = ds.open-dataset(s3_file, decode_coords=False).load() ds1.identical(ds2) ``` This should help surface whatever...

The problem is with the coordinates attribute ```python ds2c = ds2.copy() del ds2c.cLeaf.attrs['coordinates'] xr.decode_cf(ds2c) ```

``` type(ds2.cLeaf.attrs["coordinates"]) >>> h5py._hl.base.Empty ``` This looks like an h5py issue actually.

I got to the bottom of this in the xarray issue above. It actually has nothing to do with fsspec. It's a difference between the netCDF4 engine and the h5netcdf...

Currently only h5netcdf can open files efficiently over http / s3 via fsspec. Your example has uncovered a bug in h5netcdf (https://github.com/pydata/xarray/issues/5172). Therefore, you will not be able to do...

Another workaround would be to use a preprocessing function to drop the weird `coordinates` attribute. That is what is causing `decode_coords` to fail.

So both Google Cloud and AWS provide "file transfer services" to / from cloud storage - AWS has [several options](https://aws.amazon.com/cloud-data-migration/); not sure which is best - https://aws.amazon.com/datasync - https://aws.amazon.com/aws-transfer-family/ -...

I did some reading about the library parsl and its support for file transfer / staging. https://parsl.readthedocs.io/en/stable/userguide/data.html#staging-data-files It seems to have a pretty flexible system for file staging which includes...

Good point Alex! This option will go away with the beam refactor.