TDSCatalog not returning all datasets
Hi, I ran the following siphon code yesterday ~~and received a full response. However, the catalog changed throughout the day and added one new dataset epic_1b_20240413222222_03.h5. When I reran this code, it did not return the additional dataset. Is there any local caching going on?~~ and i'm missing the final dataset
from siphon.catalog import TDSCatalog
remote_cat = TDSCatalog('https://opendap.larc.nasa.gov/opendap/DSCOVR/EPIC/L1B/2024/04/catalog.xml')
print(remote_cat.datasets[-1].name)
# 'epic_1b_20240413203419_03.h5' <-- not the most recent
When I request/parse the data without siphon i can see the latest data:
import requests
import xmltodict
response = requests.get("https://opendap.larc.nasa.gov/opendap/DSCOVR/EPIC/L1B/2024/04/catalog.xml")
data = xmltodict.parse(response.text)
print(data["thredds:catalog"]["thredds:dataset"]["thredds:dataset"][-2]["@name"])
# 'epic_1b_20240413203419_03.h5' <-- 2nd most recent, same as above
print(data["thredds:catalog"]["thredds:dataset"]["thredds:dataset"][-1]["@name"])
# 'epic_1b_20240413222222_03.h5' <-- most recent
Using:
MacOS 14.4.1
python --version: Python 3.11.3
python -c 'import siphon; print(siphon.__version__): 0.9
edit: revising due to misunderstanding
It's not a caching issue, but a problem in how we parse the catalog, specifically when individual datasets have listed access methods...like is done on the NASA Hyrax server. Essentially we never properly set up access methods for the last dataset, and it gets dropped.
Gotcha, thanks. Looking back at the metadata, I never had the final dataset, so nothing to do with re-fetching the data. But glad I could surface the bug of sorts. Revising the bug report/title for clarity