xarray
xarray copied to clipboard
droping variables when accessing remote datasets via pydap
Is your feature request related to a problem?
I ran into the following issue when trying to access a remote dataset. Here is the concrete example that reproduces the error.
from pydap.client import open_url
from pydap.cas.urs import setup_session
import xarray as xr
import numpy as np
username = "_UsernameHere_"
password= "_PasswordHere_"
filename = 'Daymet_Daily_V4R1.daymet_v4_daily_na_tmax_2010.nc'
hyrax_url = 'https://opendap.earthdata.nasa.gov/collections/C2532426483-ORNL_CLOUD/granules/'
url1 = hyrax_url + filename
session = setup_session(username, password, check_url=hyrax_url)
ds = xr.open_dataset(url1, engine="pydap", session=session)
The last line returns an error:
ValueError: dimensions ('time',) must have the same length as the number of data dimensions, ndim=2
The issue involves the variable time_bnds
. I know that because this works:
DS = []
for var in [var for var in tmax_ds.keys() if var not in ['time_bnds']]:
DS.append(xr.open_dataset(url1+'?'+var, engine='pydap', session=session))
ds = xr.merge(DS)
I also tried passing decode_times=False
but continue having the error. The above for loop works but I think unnecessarily too slow (~30 secs).
I tried all this with the newer versions of xarray.__version__
= [2024.2, 2024.3].
Describe the solution you'd like
I think it would be nice to be able to drop the variable I know I don't want. So something like this:
ds = xr.open_dataset(url1, drop_variables='time_bnds', engine="pydap", session=session)
and only create a xarray.dataset with the variables I want. However when I do that I continue to have the same error as before, which means that drop_variables
is being applied after creating the xarray.dataset.
Describe alternatives you've considered
This is potentially a backend
issue with pydap - which does not take a drop_variables
option, but since dropping a variable is a one-liner in pydap
and takes less than 1milisec, it makes it an desirable feature.
For example I can easily open the dataset and drop the variable with pydap as described below
$ dataset = open_url(url1, session=session) # this works
$ dataset[tuple([var for var in dataset.keys() if var not in ['time_bnds']])] # this takes < 1ms
>>> <DatasetType with children 'y', 'lon', 'lat', 'time', 'x', 'tmax', 'lambert_conformal_conic', 'yearday'>
It looks like it would be a easy implementation on the backend, but at the same time I took a look at pydap_.py
https://github.com/pydata/xarray/blob/b80260781ee19bddee01ef09ac0da31ec12c5152/xarray/backends/pydap_.py#L129-L130
and I feel like it could also be implemented at the xarray
level by allowing drop_variables
which is already an argument in xarray.open_dataset
, to be passed to the PydapDataStore
(I guess in both scenarios drop_variables would be passed).
Any thoughts or suggestions? I can certainly lead on this effort as I already will be working on enabling the dap4 implementation within pydap.
Passing drop_variables
down to the backend seems like a good idea but will take some effort to implement across all backends.
Do you know why it's only reporting one dimension name for a 2D variable?
Passing
drop_variables
down to the backend seems like a good idea but will take some effort to implement across all backends.
yeah, totally sound fair.
Do you know why it's only reporting one dimension name for a 2D variable?
I think the problem is within time_bnds[time, nv]
and particularly the dimension nv=[0,1]
. nv
is listed as a global dimension in the attributes but it is not actually defined in the array. That is why dropping time_bnds
also gets rid of the problem. Some of these older files frustratingly do that (this one is from 2010) and, from what I understand because they are used for validation tests, it is hard to changed them.
FYI: one can always inspect the metadata of a file by appending a .dmr
or .html
to the filename (for NASA files you may have to log first via Earth Data) https://opendap.earthdata.nasa.gov/collections/C2532426483-ORNL_CLOUD/granules/Daymet_Daily_V4R1.daymet_v4_daily_na_tmax_2010.nc.dmr