Rasters.jl Geometric lookups and "vector data cubes"

It's super easy to create dimension axes that have vector lookups:

julia> using CairoMakie, DimensionalData
julia> points = Point2f.(1, 1:10)
10-element Vector{Point{2, Float32}}:
 [1.0, 1.0]
 [1.0, 2.0]
 [1.0, 3.0]
 [1.0, 4.0]
 [1.0, 5.0]
 [1.0, 6.0]
 [1.0, 7.0]
 [1.0, 8.0]
 [1.0, 9.0]
 [1.0, 10.0]

julia> rand(X(DimensionalData.Categorical(points)))
╭────────────────────────────────╮
│ 10-element DimArray{Float64,1} │
├────────────────────────────────┴─────────────────────────────────────── dims ┐
  ↓ X Categorical{Point{2, Float32}} [Float32[1.0, 1.0], Float32[1.0, 2.0], …, Float32[1.0, 9.0], Float32[1.0, 10.0]] ForwardOrdered
└──────────────────────────────────────────────────────────────────────────────┘
 Float32[1.0, 1.0]   0.122667
 Float32[1.0, 2.0]   0.670468
 Float32[1.0, 3.0]   0.398075
 Float32[1.0, 4.0]   0.275494
 Float32[1.0, 5.0]   0.437735
 Float32[1.0, 6.0]   0.333305
 Float32[1.0, 7.0]   0.43346
 Float32[1.0, 8.0]   0.740607
 Float32[1.0, 9.0]   0.48268
 Float32[1.0, 10.0]  0.221809

but some questions arise here.

How do we plot them? For scatter I could see just using specapi to forward color as the values of the dimvector. What about 2d arrays?
For e.g. weather station data, you would have a time axis and a position or geometry axis. How would people use data from here? What are common access patterns?
Can we make selectors like Contains, Touches, .., At, etc work, potentially using GeometryOps? Do we want an extension for this? I think this will involve geometric predicates at least.
Do we want to construct spatial indices for such geometric lookups?
Can we set GI.geometrycolumns of a raster to point to geometric lookups? Do we want to stash this in the metadata?

Sep 19 '24 05:09 asinghvi17

Dimensional data.jl already handles point lookups a bit, see mergedims. But polygons etc need more work. Probably we make a GeometryLookups.jl package so everyone can use them?

Sep 19 '24 07:09 rafaqz

scatter default is good for points with color as the values. if there is another dimension like time we plot points in 3d? and the same for lines/polygons etc
This already works for points with MergedLookup, we can use that syntax for everything else too
See merged.jl for the point implementation, for polygons we would need special lookups.
That could be better than what MergedLookup does currently. Its slow.
Yep we can do that. One hitch is that geometry can be either values in the array or in a lookup, but we can detect that. (a confusing part of the "vector data cube" talk is people use and discuss both but theyre totally different things)

Sep 19 '24 07:09 rafaqz

I just realized that geometric lookups are a great way to represent discretized global grids, since each grid cell is just an integer - a DGGSLookup could be a subtype of some AbstractGeometricLookup, and with a nice API we could have super easy plotting / subsetting / rasterization to a Cartesian grid!

(image from the DGGS talk at SDSL 2024)

Sep 20 '24 23:09 asinghvi17

For sure. Except for the nesting, and there should be optimisations for finding polygons on these grids?

Sep 21 '24 06:09 rafaqz

If you are interested in DGGS there is also https://github.com/danlooo/DGGS.jl

Nov 27 '24 08:11 felixcremer

Thanks Felix, that seems interesting! Are those actually DimArrays though, or just convenient wrapper structs? I went over the code briefly but am not able to see how this is working...

Dec 05 '24 22:12 asinghvi17

I just wrote up a more interesting vector data cube example with polygons. See the following:

using DimensionalData
using DimensionalData.Lookups
using DimensionalData.Dimensions

import DimensionalData as DD

# load data
using NaturalEarth
import GeoInterface as GI, GeometryOps as GO

fc = naturalearth("admin_0_countries", 10)
geometries_data = fc.geometry
id_data = fc.ADM0_A3
pop_data = fc.POP_EST .|> Float64

# now for the fun part
# let's define a dimension named Geometry
# TODO: how can we make this be interpreted as (X, Y)?  can we somehow hook into the same things that MergeDims does?

@dim Geometry

# create a categorical lookup from the geometry
geometry_lookup = Categorical(geometries_data)

# create dimvectors which have geometry lookups - simple so far
id_dd = DimVector(id_data, Geometry(geometry_lookup))
pop_dd = DimVector(pop_data, Geometry(geometry_lookup))

# stack them so they share dims
fc_stack = DimStack((; id = id_dd, pop = pop_dd))



# Now for the _really_ fun part: mixing dimensions
# create a 2d dimarray, that hypothetically has height measurements over time for each geometry
height_dim_matrix = rand(Geometry(geometry_lookup), Ti(1:10))

# stack 'em!
full_stack = DimStack((; id = id_dd, pop = pop_dd, height = height_dim_matrix))

# select the USA
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))]
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].id
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].pop
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].height

this is pretty sweet I think...now just have to figure out how to actually write this in a CF compliant way, and how to detect these lookups when plotting 😢

Dec 05 '24 22:12 asinghvi17

Just using Where is funny, like that just works as is

I guess we can accelerate that for GO predicates with an Extent index, if we use a Fix2

Dec 05 '24 22:12 rafaqz

yep! I'm building out a PolygonLookup now that would support that, but we could change that to be a more generic GeometricLookup at some point that works for any geometries

Dec 05 '24 22:12 asinghvi17

btw the big issue here is probably going to be file I/O. Any suggestions on that? NetCDF has some support for this but I have no clue how we would get this out from a nc file:

example.nc.txt (remove the .txt after, that's for github sake). This file has polygons with holes, if we can handle that I figure we're pretty much done haha

Dec 05 '24 22:12 asinghvi17

Have you looked at the python/R vector data cubes ? If we can use something already being standardised...

(The CF standard is likely pretty arcane)

Dec 05 '24 22:12 rafaqz

This is really exciting!

And making this a DataFrame almost works, except height vectors are concatenated and scalars atrtibutes are duplicated :

DataFrame(full_stack)
2580×5 DataFrame
  Row │ Geometry         Ti     id      pop             height      
      │ Abstract…        Int64  String  Float64         Float64     
──────┼─────────────────────────────────────────────────────────────
    1 │ 2D MultiPolygon      1  IDN          2.70626e8  0.379892
    2 │ 2D MultiPolygon      1  MYS          3.19498e7  0.851437
    3 │ 2D MultiPolygon      1  CHL          1.8952e7   0.0670016
    4 │ 2D Polygon           1  BOL          1.15131e7  0.925698
    5 │ 2D MultiPolygon      1  PER          3.25105e7  0.699816

Dec 05 '24 23:12 alex-s-gardner

Yeah...Ideally we would want a sliced dimtable from the stack, with a given index column

Dec 05 '24 23:12 asinghvi17

Have you looked at the python/R vector data cubes ? If we can use something already being standardised...

(The CF standard is likely pretty arcane)

They also use CF :((

Dec 05 '24 23:12 asinghvi17

Ok guess we will have to implement it...

Dec 06 '24 06:12 rafaqz

For the IO part, there might be some inspiration in https://github.com/xarray-contrib/xvec/issues/26

Jan 05 '25 18:01 felixcremer

Lat-lon-Cap grid used in NASA reanalyses and MITgcm : https://juliaclimate.github.io/MeshArrays.jl/dev/tutorials/geography.html

Jan 10 '25 16:01 gaelforget

DD._dims2indices is where we would implement an Unaligned like thing that would let us harvest X and Y from getindex and move it into geometry lookups

Jan 10 '25 22:01 asinghvi17

There should always be some selector for X and Y, i.e X(10) should never work with a geometric lookup unless there is another X dim

Jan 10 '25 22:01 asinghvi17

Have you looked at the python/R vector data cubes ? If we can use something already being standardised...

(The CF standard is likely pretty arcane)

They also use CF :((

We are now working on non-CF encoding that supports geometry as variable, not only dimension coordinates. The CF encoding is indeed a bit inconvenient but at least it is an existing standard. Most of that lives on my branch so far but here's the notebook illustrating the idea https://github.com/martinfleis/xvec/blob/summary/doc/source/io.ipynb.

Feb 13 '25 09:02 martinfleis

Thanks @martinfleis that's a great write-up

@asinghvi17 I just realised the nested discrete global grid can be represented with a DimTree and geometry lookups. I think it will already work if we merged the branches.

Feb 13 '25 10:02 rafaqz

Thanks for the write up, that clarifies a lot!

@rafaqz we can indeed do a dimtree, but there is one issue - we may need to implement s2 indexing 😅 or some other accelerate, at least for either DGGS or known spherical/ellipspodal geometries.

Feb 13 '25 11:02 asinghvi17

The generic geometry lookup we have should work, but slowly?

But yes we need to make it abstract and specialise a few fast cases like hexagons as @meggart mentioned they are doing in DGGS

Feb 13 '25 11:02 rafaqz

@danlooo

Feb 13 '25 12:02 lazarusA

Regarding dimtrees, note that a cell may have multiple parents in some DGGS, e.g. here Related to https://github.com/danlooo/DGGS.jl/issues/133

Feb 13 '25 14:02 danlooo

I think that's fine, the tree isn't spatially nested explicitly like that (although it seems there could be some nice optimisations where that is possible)

It just allows layers with different lookups for the same dimensions, that could at the same time share e.g. the time dimension from lower in the tree.

So selecting tree[X=a .. b] will select across all branches even though X is not the same resolution.

The extent will be the union over all branches, so they don't have to match at the boundaries either. Branches could equally represent separate raster tiles of the same resolution.

Feb 13 '25 15:02 rafaqz

The only potential hiccup here is that DGGS are, at this point, more like matrix (actually 3d array) lookups than like a single vector flat geometry lookup.

Feb 13 '25 15:02 asinghvi17

So it would need a custom ArrayLookup and Nearest neighbour searches or something for At/Near etc?

We could just tweak ArrayLookup ? Or do I misunderstand

Feb 13 '25 15:02 rafaqz

@asinghvi17 Both are possible, even on the same grid, e.g. Uber H3 normal 1D bit index vs 3D IJK index. Hierarchical indices are usually 1D, having the parent cell as a prefix, optimized for aggregation queries. However, when it comes to distant neighbour queries, e.g. bouding boxes for viewing and convolutions, one wants to have multiple spatial DGGS dimensions more like a matrix.

Feb 13 '25 15:02 danlooo