how to template path expansion for opendap sources
Suppose I have a OpeNDAP url like this:
urlpath: http://thredds..../MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc
and I want to template the expansion of the year variable from 1979..2020 so that all years are concatenated into a single dataset. How would I do this?
For reference, here is a working intake datasource:
gridmet_opendap:
description: 'GRIDMET data from OpeNDAP'
args:
urlpath: http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc
auth: urs
chunks:
lat: 585
lon: 1386
driver: opendap
parameters:
variable:
description: climate variable
type: str
default: pr
allowed: ["pr", "tmmn", "tmmx"]
year:
description: year
type: int
default: 2000
I assume you need to explicitly declare your pattern (path_as_pattern?), but I am not sure of the syntax.
@jsignell ?
I think you just need to get rid of this chunk:
year:
description: year
type: int
default: 2000
I don't think that will work:
UndefinedError: 'year' is undefined
How would intake-xarray know to expand year without some configuration? OpeNDAP doesn't support globbing so unless we have someway to tell it what the list of urls should be, I expect we will end up with messages like:
HTTPError: 404 Not Found
Error {
code = 404;
message = "MET/pr/pr_*.nc";
};
The whole point of path_as_pattern is to do globs and use the results to fill out the variables. The opendap source doesn't handle multiple target URLs at all. So in this case, you want the number 2000 to appear in the URL and to have it also be a length-one coordinate; all so that then you can do the concat in your own code?
I'm trying to avoid this pattern:
years = range(1979, 2020)
ds = xr.concat([cat.climate.gridmet_opendap(year=year).to_dask() for year in years], dim='day')
If this was a filesystem, I could have intake concat the individual years together using a simple glob pattern.
Your solution doesn't look too bad, though ;) Given the lack of glob for opendap, you could write this into the driver, or write a new driver specifically for the list-of-urls case. You could only coerce the existing path_as_pattern code if you could find a way to glob the URLs.
Ah I see. Well you could pass a list of urls and a path_as_pattern.
As I've thought about this more, I've realized what I'm really after is a new feature in intake that would allow me to specify the range of a parameter:
year:
description: year
type: int
range:
min: 1979
max: 2020
When the range key is present, intake would essentially construct a list of urls. Thoughts on how this may work out?
I see what you mean, but I think that would have limited utility and introduce complexity into an already pretty brittle system. For instance say you have a couple of these things (for month and year) what happens when certain combinations don't exist (no year: 2020, month: 12)?
Side note: I would think range would indicate the allowable min-max not fill in all the values between.
Side note: I would think range would indicate the allowable min-max not fill in all the values between.
Right -- range doesn't indicate how to fill in; the data may be from every year (a good default!), but e.g. Census data would only be every 10 years, so filling in every integer value is only a default, not a full solution...
I would think range would indicate the allowable min-max not fill in all the values between.
That is exactly what it means at the moment. I might imagine more complex kinds of parameter expansion through the Intake parameter system, but it would need to be pretty sophisticated. Right now, it replaces one value in a string or gives a complete replacement value (where the type is not string). A new block for producing the list-of-strings maybe would be like
args:
auth: urs
chunks:
lat: 585
lon: 1386
driver: opendap
parameters:
variable:
description: climate variable
type: str
default: pr
allowed: ["pr", "tmmn", "tmmx"]
urlpath:
template: "http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/{{ variable }}/{{ variable }}_{{ '%04d' % year }}.nc"
description: year
type: expand_range
range:
min: 1979
max: 2020
step: 1
(where the templating would need to be recursive, because variable must be substituted into the string before the string->list expansion is done)
What we really want if for list comprehension to be allowed in catalogs right? Perhaps this is just a case where yaml isn't the right format.
If you get a catalog.xml for that thredds server, you can now use intake-thredds https://intake-thredds.readthedocs.io/en/latest/tutorial.html#loading-a-catalog