Improve TDSCatalog walking
Some ideas for improving walking through the catalog:
- Implement
walk()that would allow blowing through the nest hierarchy, e.g.cat.walk('Channel02/current') - Another option is to follow
pathlibwith something like:cat / 'Channel02' / 'current' - Implementing either/both, we need to use the hooks for IPython that allow for tab completion. I'm not sure if it will work for the above options, or if that's only for attribute/dictionary access; in the latter case we should instead go for an API that allows for it, since we really want to ease quick, notebook-based exploration
- A lot of this will also be improved with better string representation of the objects, as mentioned in #260
@dopplershift Hi! I'm trying out siphon for thredds handling and getting opendap links out. Trying with a few different thredds servers, I have needed to set up different code to dig down to what I need and I think it is dependent on whether the thredds catalog is nested or not, which brought me to this issue. Is there a way to handling nested catalogs with siphon? If not do you know of another package that would? I saw intake-thredds which would be great but it doesn't look updated to intake v2.
Can you provide a link, or preferably sample code, that's not opening a catalog in a way that you expect? It would be easier to give you tips to help you on your way if we're looking at the same thing.
@kthyng I had to do this recently and came up with a very rudimentary, and probably wrong, way of doing this:
from siphon.catalog import TDSCatalog
from urllib.parse import urljoin
def _opendap_urls(cat):
return [value.access_urls.get("opendap") for value in cat.datasets.values()]
def _nested_catalogs(cat):
# reached end with datasets
if not cat.catalog_refs and cat.datasets:
yield cat
# keep navigating the refs
if cat.catalog_refs:
for catalog_ref in cat.catalog_refs:
ref = urljoin(cat.catalog_url, f"{catalog_ref}/catalog.xml")
new_cat = TDSCatalog(catalog_url=ref)
yield from _nested_catalogs(new_cat)
def _get_name(catalog_url):
return catalog_url.split("catalog")[-2].strip("/")
base_catalog = TDSCatalog(catalog_url="https://www.ncei.noaa.gov/thredds-ocean/catalog/ioos/atn/catalog.xml")
nested_catalogs = _nested_catalogs(base_catalog)
datasets = {
_get_name(nested_catalog.catalog_url): _opendap_urls(nested_catalog) for nested_catalog in nested_catalogs
}
Having the ability to walk the catalog would be great though.
I haven't been back to this work for awhile unfortunately but it's a helpful thing to be able to do!