siphon icon indicating copy to clipboard operation
siphon copied to clipboard

Add a walk function for navigating THREDDS catalogue

Open tlogan2000 opened this issue 2 years ago • 2 comments

As far a I know this functionality does not exist already but believe it would be a welcome addition :

I often need to find all datasets for multiple subfolders of a thredds catalogue. To do this I resort to using a custom function in my data processing scripts (see simple example below) but ideally this would be built into siphon itself.

from siphon.catalog import TDSCatalog
# walk function
def walk(cat, depth=1):
    """Return a generator walking a THREDDS data catalog for datasets.

    Parameters
    ----------
    cat : TDSCatalog
      THREDDS catalog.
    depth : int
      Maximum recursive depth. Setting 0 will return only datasets within the top-level catalog. If None,
      depth is set to 1000.
    """
    yield from cat.datasets.items()
    if depth is None:
        depth = 1000

    if depth > 0:
        for name, ref in cat.catalog_refs.items():
            try:
                child = ref.follow()
                yield from walk(child, depth=depth - 1)

            except requests.HTTPError as exc:
                LOGGER.exception(exc)

# creat catalogue
cat = TDSCatalog(urlcat)
# access all dataset to 20 subfolders
for dd in (cat, depth=20):
    print(dd)

tlogan2000 avatar Dec 21 '23 15:12 tlogan2000

This seems like it could be a nice addition. Would you be interested in submitting a PR adding it? My only question is whether yielding from items() (so name, Dataset pairs) makes the most sense, or whether just the Dataset would be enough, since you could still get the name from ds.name?

dopplershift avatar Dec 21 '23 20:12 dopplershift

@dopplershift Sorry for the delay yes I can try to throw something together in the coming weeks. Would the most logical place to make the addition simply be a new method in the catalogue class?

tlogan2000 avatar Jan 08 '24 13:01 tlogan2000