siphon icon indicating copy to clipboard operation
siphon copied to clipboard

Improve TDSCatalog walking

Open dopplershift opened this issue 7 years ago • 4 comments

Some ideas for improving walking through the catalog:

  • Implement walk() that would allow blowing through the nest hierarchy, e.g. cat.walk('Channel02/current')
  • Another option is to follow pathlib with something like: cat / 'Channel02' / 'current'
  • Implementing either/both, we need to use the hooks for IPython that allow for tab completion. I'm not sure if it will work for the above options, or if that's only for attribute/dictionary access; in the latter case we should instead go for an API that allows for it, since we really want to ease quick, notebook-based exploration
  • A lot of this will also be improved with better string representation of the objects, as mentioned in #260

dopplershift avatar Jan 18 '19 00:01 dopplershift

@dopplershift Hi! I'm trying out siphon for thredds handling and getting opendap links out. Trying with a few different thredds servers, I have needed to set up different code to dig down to what I need and I think it is dependent on whether the thredds catalog is nested or not, which brought me to this issue. Is there a way to handling nested catalogs with siphon? If not do you know of another package that would? I saw intake-thredds which would be great but it doesn't look updated to intake v2.

kthyng avatar Dec 18 '24 15:12 kthyng

Can you provide a link, or preferably sample code, that's not opening a catalog in a way that you expect? It would be easier to give you tips to help you on your way if we're looking at the same thing.

dopplershift avatar Dec 27 '24 23:12 dopplershift

@kthyng I had to do this recently and came up with a very rudimentary, and probably wrong, way of doing this:

from siphon.catalog import TDSCatalog
from urllib.parse import urljoin



def _opendap_urls(cat):
    return [value.access_urls.get("opendap") for value in cat.datasets.values()]


def _nested_catalogs(cat):
    # reached end with datasets
    if not cat.catalog_refs and cat.datasets:
        yield cat
    # keep navigating the refs
    if cat.catalog_refs:
        for catalog_ref in cat.catalog_refs:
            ref = urljoin(cat.catalog_url, f"{catalog_ref}/catalog.xml")
            new_cat = TDSCatalog(catalog_url=ref)
            yield from _nested_catalogs(new_cat)

def _get_name(catalog_url):
    return catalog_url.split("catalog")[-2].strip("/")

base_catalog = TDSCatalog(catalog_url="https://www.ncei.noaa.gov/thredds-ocean/catalog/ioos/atn/catalog.xml")

nested_catalogs = _nested_catalogs(base_catalog)

datasets = {
    _get_name(nested_catalog.catalog_url): _opendap_urls(nested_catalog) for nested_catalog in nested_catalogs
}

Having the ability to walk the catalog would be great though.

ocefpaf avatar Sep 11 '25 13:09 ocefpaf

I haven't been back to this work for awhile unfortunately but it's a helpful thing to be able to do!

kthyng avatar Sep 12 '25 22:09 kthyng