siphon Types not the same for indexing vs. iteration of catalog.datasets

Try to include:

Code Sample: here the URL doesn't matter, but is provided for convenience.

from datetime import datetime, timedelta
from siphon.catalog import TDSCatalog

day = datetime(2018, 5, 24)
start = day + timedelta(hours=16)
end = day + timedelta(hours=24)
url = f"https://data.nssl.noaa.gov/thredds/catalog/WRDD/OKLMA/{day:%Y/%m/%d}/catalog.xml"

cat = TDSCatalog(url)

print(type(cat.datasets))
# <class 'siphon.catalog.DatasetCollection'>

print(type(cat.datasets[0]))
# <class 'siphon.catalog.Dataset'>

for ds in cat.datasets:
    print(type(ds))
#<class 'str'>
#<class 'str'>
# etc.

Problem description: When indexing cat.datasets, the return type is a Dataset object, but when iterating the return type is a string.
Expected output: I expected the same return type, preferably a Dataset object. It would seem to be more Pythonic if DatasetCollection behaved like a list or tuple of objects.
Background: Since this catalog does not index the time of these data, I had to write a manual loop to parse the time from the dataset name. I used the indexing method to figure out what was in cat.datasets, and developed a parsing method based on that return type. That information was inaccurate when I converted the logic to a function to loop over datasets.
Which platform: Mac
Versions. Include the output of:
- python --version: Python 3.6.12
- `python -c 'import siphon; print(siphon.version)': 0.9

Apr 08 '21 18:04 deeplycloudy

100% chance this is because .datasets is effectively a OrderedDict where we override __getitem__. Easiest fix is probably to also override __iter__ to iterate over values().

Apr 08 '21 19:04 dopplershift

Hrmmm... now that I think about this with just a bit more thought, this is standard dict behavior. For this code:

d = {'a': 1, 'b':2, 'c':3}

print(type(d['a']))
# <class 'int'>

for i in d:
    print(type(i))
# <class 'str'>
...

So right now DatasetCollection works exactly like a python dictionary. This is because you can get items out of DatasetCollection by position (i.e. [0]) or by name (i.e. my_data_file_name.nc). And to be clear, it's always had the dictionary semantics--we added later on the ability to use [0]. I'm not sure whether it'd be a good idea to fundamentally change those semantics...

Apr 08 '21 19:04 dopplershift

Ah, right you are, this is very Pythonic dict behavior once you recognize the keys are the dataset.name strings. Would you be open to a docstring change to make the iteration and indexing behavior more explicit? Are there other recommended places to document this behavior?

On Apr 8, 2021, at 14:53, Ryan May @.***> wrote:

Hrmmm... now that I think about this with just a bit more thought, this is standard dict behavior. For this code:

d = {'a': 1, 'b':2, 'c':3}

print(type(d['a']))

<class 'int'>

for i in d: print(type(i))

<class 'str'>

... So right now DatasetCollection works exactly like a python dictionary. This is because you can get items out of DatasetCollection by position (i.e. [0]) or by name (i.e. my_data_file_name.nc). And to be clear, it's always had the dictionary semantics--we added later on the ability to use [0]. I'm not sure whether it'd be a good idea to fundamentally change those semantics...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Apr 08 '21 20:04 deeplycloudy

👍 to updating the docstring. I can't think of anything other place unless we add some proper narrative documentation about TDSCatalog or an example explicitly about working with .datasets.

Apr 08 '21 20:04 dopplershift

OK, PR for the doc fix is above. I suggest also updating section 3. of the workshop tutorial on Siphon, which is the code I was copying when I confused myself.

Apr 08 '21 21:04 deeplycloudy

This continues to be subtle and surprising to me. I thought I understood well enough to write this up for the tutorial, and then cat.datasets[1:3] gave me a list of datasets, as though cat.datasets were a list of Datasets. I thought I'd get the first three keys! In retrospect it makes sense… slicing is just fancy integers, so it switches to giving values instead of keys.

My troubleshooting was made further confusing by the fact that cat.datasets[1:3] looks like a list of keys, and so the repr might be a target for a usability fix?

I actually built in this latest confusion as a teachable moment in the siphon tutorial notebook.

The slicing behavior is not explicitly documented in my docstring change.

Apr 09 '21 00:04 deeplycloudy