GeoInterface.jl icon indicating copy to clipboard operation
GeoInterface.jl copied to clipboard

Add Raster API.

Open evetion opened this issue 3 years ago • 14 comments

evetion avatar Nov 14 '22 18:11 evetion

I'm wondering if we also need functions to retrieve dimension vectors and matrices. Netcdf/gribb don't often have affine maps, and may not have even step sizes.

(Can of course be added later too)

rafaqz avatar Nov 14 '22 18:11 rafaqz

Hmm, yeah, maybe good to look to the CF-conventions, and use that naming. For now, affine will do for "simple" rasters as defined in the GDAL data model though.

For future reference, we have:

  • no mapping (maybe have a default affine map, instead of nothing?)
  • non-linear 1d mapping of coordinates for each axis
  • non-linear 2d mapping of coordinates for each axis

Did you come across others?

evetion avatar Nov 14 '22 19:11 evetion

Also combinations of 1d and 2d, like 1d time and 2d x/y

Bands are an example of having no map on an axis, or a categorical map. I'm not sure it needs a default affine map.

rafaqz avatar Nov 14 '22 21:11 rafaqz

After having thought about this some more. We can also decide on having coords(i,j) -> x, y and the reverse index(x,y) -> i,j. That would also work for all your use cases right? Only question is how to handle out of bounds errors, as index could return invalid coordinates.

evetion avatar Nov 25 '22 18:11 evetion

Yeah that should work. We can assume two arguments are always X/Y and three are always X/Y/Z. Any other dimensions are outside the interface. On the topic of efficiency we could also pass coords and index ranges or vectors so they can do things in bulk and return any AbstractArray of coords or indices (but that can be later).

One issue is getting the index back from coordinates can be a little hard with floating point numbers. To fix this in DimensionalData.jl/Rasters..jl At selector has tolerance keywords to allow for small difference using isapprox. Maybe we can allow that using atol and rtol keywords we pass to isapprox.

On bounds errors, in Raster.jl indexing with out of bounds coordinates gives a BoundsError, so I think that's a reasonable thing to do?

There could maybe be an option to instead return missing like in checkbounds? So index(Missing, obj, x, y) will return missing rather than error, as checkbounds does with checkbounds(Bool, obj, i, j) -> false.

rafaqz avatar Dec 05 '22 18:12 rafaqz

@Alexander-Barth

evetion avatar Jan 11 '23 19:01 evetion

For future reference, we have:

no mapping (maybe have a default affine map, instead of nothing?) non-linear 1d mapping of coordinates for each axis non-linear 2d mapping of coordinates for each axis

For information, some ocean (and atmospheric model, I think), the vertical axis depend on time. I like the NetCDF data model (with using named dimension), because it can handle this situation without needing a special case. Also there is no hard distinction between coordinates and other related variables, so selecting salinity for longitude in [20,30] can be handed the same way as selecting salinity for temperature in [20,30] which is the basic idea behind NCDatasets.@select:

# load all temperature data from January where the salinity is larger than 35.
# (in this example temperature/salinity/time are 1D vectors. If temperature/salinity are 2D, this would not work )
v = NCDatasets.@select(ds["temperature"],Dates.month(time) == 1 && salinity >= 35)
# v is a "lazy" view of the underlying array

https://github.com/Alexander-Barth/NCDatasets.jl/blob/master/src/select.jl#L151

What I do not like about the NetCDF data model, is that it is inefficient for the common case (lon/lat are simply ranges, defined using the Mercator projection for example), but as I mentioned to @evetion maybe we can have an array type whose values are defined using a function (an affine map for example). This type can also implement the inverse operation (going the inverse to coordinates) which can be quite slow in the general case.

Alexander-Barth avatar Jan 12 '23 12:01 Alexander-Barth

As Felix pointed out, there's https://github.com/JuliaGeo/DimensionalArrayTraits.jl

evetion avatar Jan 13 '23 09:01 evetion

So, open questions:

  • [ ] How to deal with the AbstractArray (Matrix?) itself. Subtype, or require a values(raster) that should return an AbstractMatrix?
  • [ ] With the above, the i, j returned from index(x, y) could be used to get/set index into the AbstractMatrix.
  • [ ] Should we replace i, j by CartesianIndex?
  • [ ] I assume Center() for the x, y returned by coords for now, we could extend this (in the future?) with a strategy kw?

I'm less interested in the names of the dimensions, as this should just be the first and second dimension as defined by the crs coordinate order. However, I can imagine that the get/set index with only two unnamed Int can be problematic in existing packages, possibly requiring a "SpatialSlice" view to work with this interface?

evetion avatar Jan 13 '23 11:01 evetion

I'm not sure this should be in GeoInterface.jl. YAXArrays.jl doesn't depend on GeoInterface and has no polygon operations.

DimensionalArrayTraits.jl was an attempt at this when I really didn't know what mattered, and it was too hard to work on with functionality across multiple packages. But I think we have a better idea now and can do something like that properly.

YAXArrays.jl is looking at moving to using DimensionalData.jl dimensions, and shares a few backends with Rasters.jl. A project like this should try to remove that duplication and push most of it into the source packages.

@meggart may also have some comments for this. Chunking and DiskArrays.jl compatibility is another component of a generic raster interface.

Another thing: the names of the dimensions is crucial information in netcdf, and depending on the crs is not always an option.

rafaqz avatar Jan 28 '23 13:01 rafaqz

I'm not sure this should be in GeoInterface.jl. YAXArrays.jl doesn't depend on GeoInterface and has no polygon operations.

Things like crs and extent are geospatial specific, not vector/polygon specific. But agreed that it's a departure from purely doing simple features. I just rather do it here than have a new package.

DimensionalArrayTraits.jl was an attempt at this when I really didn't know what mattered, and it was too hard to work on with functionality across multiple packages. But I think we have a better idea now and can do something like that properly.

How would that API look like? Would be great to also enable conversions like here in GeoInterface between these rasters.

YAXArrays.jl is looking at moving to using DimensionalData.jl dimensions, and shares a few backends with Rasters.jl. A project like this should try to remove that duplication and push most of it into the source packages.

Dimensionaldata could depend on GeoInterface for example and solve it for both Rasters and YAXArrays? I just want to have the smallest possible API for the geospatial part of arrays/dimensions and if that requires some part of the DimensionalData API that's fine.

evetion avatar Feb 11 '23 14:02 evetion

@JuliaGeo/collaborators What do you think about dropping the affine/index/coords from this PR (as that discussion stranded), and just having a crs/extent defined for rasters (or non-vector in general) stuff as well? This would also help me with getting Geomorphology to automatically work on geographic grids.

evetion avatar Feb 03 '24 15:02 evetion

There's a lot of room for confusion between coords and coordinates, so I'm in favour of dropping it...

asinghvi17 avatar Jun 20 '24 13:06 asinghvi17

Ok, this should be good to go? Maybe even in time for JuliaCon!

evetion avatar Jul 04 '24 17:07 evetion