input function generalization
Thoughts on the read_gdal/read_ncdf distinction:
We need an abstraction over the ncsub and RasterIO idioms, I don't see that either of them is a good user-interface, but at least ncsub is the general form if you take it to cbind(start, count, outlength).
Use that as the general form and provide a bunch of helpers, to convert between geotransform/extent-dim-res/coord-ranges/RasterIO/start-count-stride? I've kind of been collecting these little components slowly, not being sure how to express them all in a single place.
Users will want a bounding box in coordinate space, and an output size or resolution - and often they'll be created by drawing or dragging sliders rather than directly numerically, or by a bit of back and forth guesswork reading the extent/proxy. Does it make sense to use the NetCDF start/count model, and add NearestNeighbour resampling option for outlength (after getting from the source), and then the gdal version can convert those to offset/window/outsize and also allow added resampling options?
Moved from https://github.com/r-spatial/stars/issues/54, which included this reply from @edzer
transforming a curvilinear grid should be a simple matter of transforming the two coordinate matrices transforming any non-curvilinear grid should, at least optionally, return a curvilinear grid without doing any resampling.
Specific tasks that are prerequisites for more complete handling:
- [ ] allow using NetCDF with groups (e.g. Sentinel 5P), requires updates to ncmeta
- [ ] heuristics to determine that 2D vars are curvilinear coords of 3D+ vars
- [ ] conversion kit between geotransform/extent-dim-res/coord-ranges/RasterIO/start-count-stride idioms
 heuristics to determine that 2D vars are curvilinear coords of 3D+ vars
If a netCDF file follows the CF-Conventions, this should be indicated by the coordinates attribute. If the file is properly made... which is not always the case. In that case, stars should have a way to fix things (which is more or less already in place right now).
Ah ok, thanks! I noticed something like that today, we were discussing means of matching lon,lat (in [x,y]) to var[x,y] - and I think it's reasonable to set up a heuristic to find it. If you can point out some examples (ncdump -h will suffice, but source files also if possible) that would help - I'll compare with the examples I have.
Are there ever 3D coords? (I've not seen them, ROMs uses lon,lat in [x,y] and applies a stretching in z for every pixel relative to the bottom).
I also think a heuristic will be fairly safe, it's only a bit iffy when there are dozens of variables and dimensions - and will save using that curvilinear = ll step. (I didn't really catch on when that was being discussed elsewhere). I'm working on making this easier to manage from ncmeta, along with groups (needed for s5p).
#12 and #54 contain links to properly CF-Compliant files with 2, 3 and 4D variables, with spatial coordinates in LCC projection, which is the EURO-CORDEX grid we usually use. Do you want me to fetch some examples of badly formatted models with unclear / missing grid information?
Yeah, I see now - that kind of robustness is not what I'm used to from ~18 or so years with NetCDF in the wild. Any examples that are motivating or useful to you will always be of interest!
I have to rethink some things, I need to make sure the way I go about things puts the good new formats first.
Two examples of models I had to deal with recently: https://drive.google.com/open?id=1nVyq6Nw7c0WaG1nE8p_0pwX0hoRNBnN5 https://drive.google.com/open?id=1p6-IR6NdH_tAlMYtgNcFXFjXWo3iaNAR
One does not have any projection information, but relies only on lat-lon arrays; one instead has some indication of the projection, but it is not formatted according to common practice.
Here is instead another example of a dataset I use often, which is projected on LAEA and has a proper grid description so should work OOTB: https://drive.google.com/open?id=1GiW_POsYnItSW59NXzlq2NDbLN8wvpU-
Another example can be found in #52, which presents a challenge since ob_trans seems to not be supported very well ATM.
wasn't ob_tran short for obscure transformation?
great! I'm trying to be systematic and make a record of your examples and learn piggyback:
https://github.com/mdsumner/weird.nc/releases
(the bundle weird.tar.gz shows up the Releases artefacts, it seems to float around so I think there's only the latest copy)
Obscene Bollocks maybe? Unfortunately it is an extremely common transformation, many models and observations run with rotated pole.
I think the problem lies in unclear (to me!) PROJ support; to get an idea just grep the rgdal code and docs on ob_tran. In case the transformation and its inverse are simple, when we have long/lat coordinates we may want to do it outside PROJ, for clarity.
See also this ticket.
Time for a new issue ... but seriously those LCC grids threw me a bit, I have to rethink some things. Once I've got ncmeta back in order I'll have a closer look at the Coordinate attribute. :)
I wonder if we should leverage GDAL for the grid interpretation, but then allow applying that to a NetCDF I/O workflow? There's a lot of smart work gone into GDAL to sort that out. The typical criticism of GDAL is that 1) it flattens arrays to bands 2) it over-interprets rather than using what's there - but with ob_tran, these projected grids, geolocation arrays and ground control points generally it's a very powerful and smart framework - in NetCDF you're basically inferring all that stuff and upgrading to it. We might take the GDAL 2D interpretation and apply it to stars' n-d arrays read directly, rather than by wrapping bands back up.
I'm not ready for this but I will try it out at some point.
Time for a new issue ... but seriously those LCC grids threw me a bit, I have to rethink some things. Once I've got ncmeta back in order I'll have a closer look at the Coordinate attribute. :)
For your convenience: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-system
In short, 'good' netcdfs should have a coordinates and a grid_mapping for each projected variable. The latter should indicate which variable contains the projection information. There are alternative methods (e.g. a crs global attribute is also accepted).
I think the problem lies in unclear (to me!) PROJ support; to get an idea just grep the rgdal code and docs on ob_tran. In case the transformation and its inverse are simple, when we have long/lat coordinates we may want to do it outside PROJ, for clarity.
As far as I understand it ob_trans is basically just a coordinate traslation, should be relatively easy.
https://proj4.org/operations/projections/ob_tran.html
Continuing from https://github.com/r-spatial/stars/issues/14#issuecomment-463576200 - the distinction between read_stars, read_ncdf
I've been thinking about this, my intention is to i) enable proxy read for read_ncdf ii) leverage the existing GDAL (and @edzer) logic for the auxiliary data iii) consider ways to generalize the front end/s. I do think we need both the NetCDF start/count and the GDAL source-window/out-window idioms, and we need some kind of wrappers that allow pure-index operation (as now) as well as coordinate-space operations. I think that's not well provided by stars yet, but in raster it's very powerful - extent, index-to-coord, index-ops, and I'll probably experiment with a raster front-end to the stars read to clarify my thoughts.
I might add that currently for some files read_ncdf and read_stars do not agree on dimension direction and naming, which is not ideal: (test file link)
filename = 'test_ncdf_stars.nc'
> read_stars(filename) %>% adrop %>% st_dimensions()
from to offset delta refsys point values
x 1 464 -40.5 0.25 NA NA NULL [x]
y 1 201 75.5 -0.25 NA NA NULL [y]
> read_ncdf(filename) %>% adrop %>% st_dimensions()
from to offset delta refsys point values
longitude 1 464 -40.5 0.25 NA NA NULL [x]
latitude 1 201 25.25 0.25 NA NA NULL [y]
Does stars prefer one of the two dimension ordering? IIRC the CF conventions did specify one, but I can't seem to find the specifics.
Good example thanks, definitely an open problem afaic
Good point. Currently, read_stars can't, as GDAL doesn't give you dimension names, and also doesn't tell you the order in which they come in the source file.
Gee, that's scary and surprising. I have pretty solid opinions about what should happen here (order before interpretation etc) - just FYI I don't feel comfortable about our ability to persecute this online, it definitely needs personal interaction, arm-waving and eyebrows as key discussion tools. :)
OK, I'm in - use a google hangout? together with @adrfantini @dblodgett-usgs you and me?
oh yeah, my afternoon next week I guess - Tuesday probably ok
sounds good; what time in CEST?
oh god, I see 1700 for me and 0700 for you but forever confused about TZ ...
We are on CET, UTC+1, you are on UTC+11 right? 0700 CET is too early for me, too much noise in the house of people getting ready for school etc. and too early to be at work either (commute time kills me...).
I'm on CST 0700 CET is 1200 CST. I much prefer morning in Europe / Evening in AU.
OK, @mdsumner please suggest a time and a medium.
Sorry, things got very busy this week I'll try to suggest something next week
Here's a suggestion: pkg stars will
- provide three functions,
read_ncdf,read_gdalandread_stars - the first two do what they say,
read_starswhen given a NetCDF file will tryread_ncdffirst, and otherwise, or when it fails, tryread_gdal.