stars icon indicating copy to clipboard operation
stars copied to clipboard

input function generalization

Open mdsumner opened this issue 7 years ago • 35 comments

Thoughts on the read_gdal/read_ncdf distinction:

We need an abstraction over the ncsub and RasterIO idioms, I don't see that either of them is a good user-interface, but at least ncsub is the general form if you take it to cbind(start, count, outlength).

Use that as the general form and provide a bunch of helpers, to convert between geotransform/extent-dim-res/coord-ranges/RasterIO/start-count-stride? I've kind of been collecting these little components slowly, not being sure how to express them all in a single place.

Users will want a bounding box in coordinate space, and an output size or resolution - and often they'll be created by drawing or dragging sliders rather than directly numerically, or by a bit of back and forth guesswork reading the extent/proxy. Does it make sense to use the NetCDF start/count model, and add NearestNeighbour resampling option for outlength (after getting from the source), and then the gdal version can convert those to offset/window/outsize and also allow added resampling options?

Moved from https://github.com/r-spatial/stars/issues/54, which included this reply from @edzer

transforming a curvilinear grid should be a simple matter of transforming the two coordinate matrices transforming any non-curvilinear grid should, at least optionally, return a curvilinear grid without doing any resampling.

mdsumner avatar Oct 03 '18 23:10 mdsumner

Specific tasks that are prerequisites for more complete handling:

  • [ ] allow using NetCDF with groups (e.g. Sentinel 5P), requires updates to ncmeta
  • [ ] heuristics to determine that 2D vars are curvilinear coords of 3D+ vars
  • [ ] conversion kit between geotransform/extent-dim-res/coord-ranges/RasterIO/start-count-stride idioms

mdsumner avatar Oct 04 '18 01:10 mdsumner

 heuristics to determine that 2D vars are curvilinear coords of 3D+ vars

If a netCDF file follows the CF-Conventions, this should be indicated by the coordinates attribute. If the file is properly made... which is not always the case. In that case, stars should have a way to fix things (which is more or less already in place right now).

adrfantini avatar Oct 04 '18 08:10 adrfantini

Ah ok, thanks! I noticed something like that today, we were discussing means of matching lon,lat (in [x,y]) to var[x,y] - and I think it's reasonable to set up a heuristic to find it. If you can point out some examples (ncdump -h will suffice, but source files also if possible) that would help - I'll compare with the examples I have.

Are there ever 3D coords? (I've not seen them, ROMs uses lon,lat in [x,y] and applies a stretching in z for every pixel relative to the bottom).

I also think a heuristic will be fairly safe, it's only a bit iffy when there are dozens of variables and dimensions - and will save using that curvilinear = ll step. (I didn't really catch on when that was being discussed elsewhere). I'm working on making this easier to manage from ncmeta, along with groups (needed for s5p).

mdsumner avatar Oct 04 '18 09:10 mdsumner

#12 and #54 contain links to properly CF-Compliant files with 2, 3 and 4D variables, with spatial coordinates in LCC projection, which is the EURO-CORDEX grid we usually use. Do you want me to fetch some examples of badly formatted models with unclear / missing grid information?

adrfantini avatar Oct 04 '18 11:10 adrfantini

Yeah, I see now - that kind of robustness is not what I'm used to from ~18 or so years with NetCDF in the wild. Any examples that are motivating or useful to you will always be of interest!

I have to rethink some things, I need to make sure the way I go about things puts the good new formats first.

mdsumner avatar Oct 04 '18 12:10 mdsumner

Two examples of models I had to deal with recently: https://drive.google.com/open?id=1nVyq6Nw7c0WaG1nE8p_0pwX0hoRNBnN5 https://drive.google.com/open?id=1p6-IR6NdH_tAlMYtgNcFXFjXWo3iaNAR

One does not have any projection information, but relies only on lat-lon arrays; one instead has some indication of the projection, but it is not formatted according to common practice.

Here is instead another example of a dataset I use often, which is projected on LAEA and has a proper grid description so should work OOTB: https://drive.google.com/open?id=1GiW_POsYnItSW59NXzlq2NDbLN8wvpU-

adrfantini avatar Oct 04 '18 12:10 adrfantini

Another example can be found in #52, which presents a challenge since ob_trans seems to not be supported very well ATM.

adrfantini avatar Oct 04 '18 12:10 adrfantini

wasn't ob_tran short for obscure transformation?

edzer avatar Oct 04 '18 12:10 edzer

great! I'm trying to be systematic and make a record of your examples and learn piggyback:

https://github.com/mdsumner/weird.nc/releases

(the bundle weird.tar.gz shows up the Releases artefacts, it seems to float around so I think there's only the latest copy)

mdsumner avatar Oct 04 '18 12:10 mdsumner

Obscene Bollocks maybe? Unfortunately it is an extremely common transformation, many models and observations run with rotated pole.

adrfantini avatar Oct 04 '18 12:10 adrfantini

I think the problem lies in unclear (to me!) PROJ support; to get an idea just grep the rgdal code and docs on ob_tran. In case the transformation and its inverse are simple, when we have long/lat coordinates we may want to do it outside PROJ, for clarity.

edzer avatar Oct 04 '18 12:10 edzer

See also this ticket.

edzer avatar Oct 04 '18 12:10 edzer

Time for a new issue ... but seriously those LCC grids threw me a bit, I have to rethink some things. Once I've got ncmeta back in order I'll have a closer look at the Coordinate attribute. :)

mdsumner avatar Oct 04 '18 12:10 mdsumner

I wonder if we should leverage GDAL for the grid interpretation, but then allow applying that to a NetCDF I/O workflow? There's a lot of smart work gone into GDAL to sort that out. The typical criticism of GDAL is that 1) it flattens arrays to bands 2) it over-interprets rather than using what's there - but with ob_tran, these projected grids, geolocation arrays and ground control points generally it's a very powerful and smart framework - in NetCDF you're basically inferring all that stuff and upgrading to it. We might take the GDAL 2D interpretation and apply it to stars' n-d arrays read directly, rather than by wrapping bands back up.

I'm not ready for this but I will try it out at some point.

mdsumner avatar Oct 04 '18 12:10 mdsumner

Time for a new issue ... but seriously those LCC grids threw me a bit, I have to rethink some things. Once I've got ncmeta back in order I'll have a closer look at the Coordinate attribute. :)

For your convenience: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-system

In short, 'good' netcdfs should have a coordinates and a grid_mapping for each projected variable. The latter should indicate which variable contains the projection information. There are alternative methods (e.g. a crs global attribute is also accepted).

adrfantini avatar Oct 04 '18 12:10 adrfantini

I think the problem lies in unclear (to me!) PROJ support; to get an idea just grep the rgdal code and docs on ob_tran. In case the transformation and its inverse are simple, when we have long/lat coordinates we may want to do it outside PROJ, for clarity.

As far as I understand it ob_trans is basically just a coordinate traslation, should be relatively easy. https://proj4.org/operations/projections/ob_tran.html

adrfantini avatar Oct 04 '18 13:10 adrfantini

Continuing from https://github.com/r-spatial/stars/issues/14#issuecomment-463576200 - the distinction between read_stars, read_ncdf

I've been thinking about this, my intention is to i) enable proxy read for read_ncdf ii) leverage the existing GDAL (and @edzer) logic for the auxiliary data iii) consider ways to generalize the front end/s. I do think we need both the NetCDF start/count and the GDAL source-window/out-window idioms, and we need some kind of wrappers that allow pure-index operation (as now) as well as coordinate-space operations. I think that's not well provided by stars yet, but in raster it's very powerful - extent, index-to-coord, index-ops, and I'll probably experiment with a raster front-end to the stars read to clarify my thoughts.

mdsumner avatar Feb 14 '19 23:02 mdsumner

I might add that currently for some files read_ncdf and read_stars do not agree on dimension direction and naming, which is not ideal: (test file link)

filename = 'test_ncdf_stars.nc'
> read_stars(filename) %>% adrop %>% st_dimensions()
  from  to offset delta refsys point values    
x    1 464  -40.5  0.25     NA    NA   NULL [x]
y    1 201   75.5 -0.25     NA    NA   NULL [y]

> read_ncdf(filename) %>% adrop %>% st_dimensions()
          from  to offset delta refsys point values    
longitude    1 464  -40.5  0.25     NA    NA   NULL [x]
latitude     1 201  25.25  0.25     NA    NA   NULL [y]

Does stars prefer one of the two dimension ordering? IIRC the CF conventions did specify one, but I can't seem to find the specifics.

adrfantini avatar Feb 15 '19 09:02 adrfantini

Good example thanks, definitely an open problem afaic

mdsumner avatar Feb 15 '19 09:02 mdsumner

Good point. Currently, read_stars can't, as GDAL doesn't give you dimension names, and also doesn't tell you the order in which they come in the source file.

edzer avatar Feb 15 '19 09:02 edzer

Gee, that's scary and surprising. I have pretty solid opinions about what should happen here (order before interpretation etc) - just FYI I don't feel comfortable about our ability to persecute this online, it definitely needs personal interaction, arm-waving and eyebrows as key discussion tools. :)

mdsumner avatar Feb 15 '19 11:02 mdsumner

OK, I'm in - use a google hangout? together with @adrfantini @dblodgett-usgs you and me?

edzer avatar Feb 15 '19 12:02 edzer

oh yeah, my afternoon next week I guess - Tuesday probably ok

mdsumner avatar Feb 15 '19 12:02 mdsumner

sounds good; what time in CEST?

edzer avatar Feb 15 '19 12:02 edzer

oh god, I see 1700 for me and 0700 for you but forever confused about TZ ...

mdsumner avatar Feb 15 '19 12:02 mdsumner

We are on CET, UTC+1, you are on UTC+11 right? 0700 CET is too early for me, too much noise in the house of people getting ready for school etc. and too early to be at work either (commute time kills me...).

adrfantini avatar Feb 16 '19 08:02 adrfantini

I'm on CST 0700 CET is 1200 CST. I much prefer morning in Europe / Evening in AU.

dblodgett-usgs avatar Feb 19 '19 12:02 dblodgett-usgs

OK, @mdsumner please suggest a time and a medium.

edzer avatar Feb 19 '19 12:02 edzer

Sorry, things got very busy this week I'll try to suggest something next week

mdsumner avatar Feb 21 '19 20:02 mdsumner

Here's a suggestion: pkg stars will

  • provide three functions, read_ncdf, read_gdal and read_stars
  • the first two do what they say,
  • read_stars when given a NetCDF file will try read_ncdf first, and otherwise, or when it fails, try read_gdal.

edzer avatar Mar 01 '19 19:03 edzer