esgf-pyclient
esgf-pyclient copied to clipboard
Accessing opendap datasets
I am working on what I think is a fairly common workflow:
- log on to ESGS using the
LogonManager
class - search for some datasets using the
SearchConnection
class - access some opendap dataset using
netcdf4-python
orpydap
Here's an example workflow:
In [1]: openid = 'https://esgf-node.llnl.gov/esgf-idp/openid/SECRET'
...: password = 'SECRET'
...:
In [2]: from pyesgf.logon import LogonManager
...: from pyesgf.search import SearchConnection
...: import xarray as xr
...:
In [3]: # intialize the logon manager
...: lm = LogonManager(verify=True)
...: if not lm.is_logged_on():
...: lm.logon_with_openid(openid, password, 'pcmdi9.llnl.gov')
...: lm.is_logged_on()
...:
Out[3]: True
In [4]: def print_context_info(ctx):
...: print('Hits:', ctx.hit_count)
...: print('Realms:', ctx.facet_counts['experiment'])
...: print('Realms:', ctx.facet_counts['realm'])
...: print('Ensembles:', ctx.facet_counts['ensemble'])
...:
In [5]: # search for some data
...: conn = SearchConnection('http://pcmdi9.llnl.gov/esg-search', distrib=Tru
...: e)
...: ctx = conn.new_context(project='CMIP5', model='CCSM4', experiment='rcp85
...: ', time_frequency='day')
...: ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
...:
...: # print a summary of what we found
...: print_context_info(ctx)
...:
Hits: 4
Realms: {'rcp85': 4}
Realms: {'atmos': 4}
Ensembles: {'r1i1p1': 4}
In [6]: # aggregate results
...: result = ctx.search()[0]
...: agg_ctx = result.aggregation_context()
...:
...: # get a list of opendap urls
...: x = list(a.opendap_url for a in agg_ctx.search() if a.opendap_url)
...: x
...:
Out[6]:
['http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmin.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmax.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.prc.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.psl.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tas.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.pr.20120705.aggregation.1']
In [7]: # try opening one of the opendap datasets
...: xr.open_dataset(x[0], engine='pydap')
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-7-90d39efb83f7> in <module>()
1 # try opening one of the opendap datasets
----> 2 xr.open_dataset(x[0], engine='pydap')
~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
302 autoclose=autoclose)
303 elif engine == 'pydap':
--> 304 store = backends.PydapDataStore.open(filename_or_obj)
305 elif engine == 'h5netcdf':
306 store = backends.H5NetCDFStore(filename_or_obj, group=group,
~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
75 def open(cls, url, session=None):
76 import pydap.client
---> 77 ds = pydap.client.open_url(url, session=session)
78 return cls(ds)
79
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/client.py in open_url(url, application, session, output_grid)
62 never retrieve coordinate axes.
63 """
---> 64 dataset = DAPHandler(url, application, session, output_grid).dataset
65
66 # attach server-side functions
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid)
62
63 # build the dataset from the DDS and add attributes from the DAS
---> 64 self.dataset = build_dataset(dds)
65 add_attributes(self.dataset, parse_das(das))
66
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in build_dataset(dds)
159 def build_dataset(dds):
160 """Return a dataset object from a DDS representation."""
--> 161 return DDSParser(dds).parse()
162
163
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in parse(self)
47 dataset = DatasetType('nameless')
48
---> 49 self.consume('dataset')
50 self.consume('{')
51 while not self.peek('}'):
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in consume(self, regexp)
39 def consume(self, regexp):
40 """Consume and return a token."""
---> 41 token = super(DDSParser, self).consume(regexp)
42 self.buffer = self.buffer.lstrip()
43 return token
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/__init__.py in consume(self, regexp)
180 self.buffer = self.buffer[len(token):]
181 else:
--> 182 raise Exception("Unable to parse token: %s" % self.buffer[:10])
183 return token
Exception: Unable to parse token:
Questions:
- Is this actually a workflow that should work?
- Does this opendap URL actually exist? What is the best way to test that an opendap url from esgf is a valid one?
- Is additional authentication required?
Hi @jhamman, right now I don't have time to look into your issue but please see if this example sheds any light on your questions: https://github.com/cehbrecht/demo-notebooks/blob/master/esgf-opendap.ipynb
@agstephens - Indeed, I had seen this notebook. As far as I can tell, the problem seems to lie in the use of aggregation context urls to opendap datasets.
@jhamman late answer ... there might be several issues but not related to esgf-pyclient
. The aggregation might not work but it also looks like that pydap
needs to be updated to work with ESGF.
I tried it with a CORDEX aggregation and I can't get pydap
working:
https://github.com/cehbrecht/jupyterlab-notebooks/blob/master/esgf-examples/esgf-pydap.ipynb
See also: https://pydap.readthedocs.io/en/latest/client.html?#earth-system-grid-federation-esgf
Hi, I hope you all are doing well,
Can anyone help me to overcome this issue?
My OpenID is working and is connected to my ESGF acc.
Please let me know if you need more information.
Thank you,
Saeed
from pyesgf.search import SearchConnection conn = SearchConnection('https://esgf-index1.ceda.ac.uk/esg-search/',distrib=True)
ctx = conn.new_context(project= 'CORDEX', institute= 'KNMI', time_frequency= 'day', experiment= 'historical', variable= 'tas') ctx.hit_count
result = ctx.search()[14] result.dataset_id
ds = ctx.search()[14] files = ds.file_context().search() len(files)
for f in files: print(f.download_url);
from pyesgf.logon import LogonManager lm = LogonManager() lm.logoff() lm.is_logged_on()
OPENID = 'https://ceda.ac.uk/openid/xxx' lm.logon_with_openid(openid=OPENID, password=None, bootstrap=True) lm.is_logged_on()
password = 'xxx' username = 'xxx' myproxy_host = 'slcs1.ceda.ac.uk' lm.logon(username, password, hostname=myproxy_host, interactive=True, bootstrap=True) lm.is_logged_on()
import xarray as xr
ds = xr.open_dataset(f.download_url) print(ds)
`KeyError Traceback (most recent call last)
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock)
197 try:
--> 198 file = self._cache[self._key]
199 except KeyError:
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key)
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta) 506 engine = _get_default_engine(filename_or_obj, allow_remote=True) 507 if engine == "netcdf4": --> 508 store = backends.NetCDF4DataStore.open( 509 filename_or_obj, group=group, lock=lock, **backend_kwargs 510 )
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose) 356 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs 357 ) --> 358 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) 359 360 def _acquire(self, needs_lock=True):
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in init(self, manager, group, mode, lock, autoclose) 312 self._group = group 313 self._mode = mode --> 314 self.format = self.ds.data_model 315 self._filename = self.ds.filepath() 316 self.is_remote = is_remote_uri(self._filename)
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in ds(self) 365 @property 366 def ds(self): --> 367 return self._acquire() 368 369 def open_store_variable(self, name, var):
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds
D:\Anaconda\envs\gdal\lib\contextlib.py in enter(self) 111 del self.args, self.kwds, self.func 112 try: --> 113 return next(self.gen) 114 except StopIteration: 115 raise RuntimeError("generator didn't yield") from None
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file
D:\Anaconda\envs\gdal\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(*self._args, **kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again
netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()
netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
OSError: [Errno -78] NetCDF: Authorization failure: b'http://esgf1.dkrz.de/thredds/fileServer/cordex/cordex/output/EUR-11/KNMI/ICHEC-EC-EARTH/historical/r3i1p1/KNMI-RACMO22E/v1/day/tas/v20190108/tas_EUR-11_ICHEC-EC-EARTH_historical_r3i1p1_KNMI-RACMO22E_v1_day_20010101-20051231.nc'`
@saeedvzf I have similar problems. It's probably because you have project= 'CORDEX'
. You need special authorization to access that data via open_dap using the CORDEX project_id. I see that you have logged on. So you log on to one of the webportals of ESGF data nodes and check if you are part of the cordex project in your profile. If not, you can simply click something like join Cordex project
in the top.