identify points that occupy same array "cell" location
I would like to filter a set of points scattered on a stars array such that no array location is duplicated. I don't think that the raster concept of "cell" isn't supported - at least that is what I gather from the raster-stars functionality map. But something like raster::cellFromXY() seems like what I am looking for. The closest thing I can think of is a function that creates a copy of the stars object, reassigns values to simple "cell" index values and then runs extract. I can then filter the original dataset for duplicate "cell" indices.
In the example below, I try to programmatically identify that points A and B occupy the same "cell". It works, but I feel like there might be a better way to do this. Any guidance on this?
# https://github.com/r-spatial/stars/wiki/How-%60raster%60-functions-map-to-%60stars%60-functions
suppressPackageStartupMessages({
library(dplyr)
library(sf)
library(stars)
})
# Retrieve the 2d "cell" index of a set of points into a stars object
#
# @param x stars object
# @param p object with POINT geometry
# @return vector of 'cell' indices
cell_index <- function(x, p){
x = x[1]
names(x) <- "index"
d = dim(x)
mutate(x, index = seq_len(prod(d[1:2]))) |>
st_extract(at = p) |>
pull(index)
}
nr = 5
nc = 4
n = nc*nr
m = matrix(seq_len(n), nrow = nr, ncol = nc)
dim(m) = c(x = nr, y = nc)
s = st_as_stars(m)
x = tibble(id = LETTERS[1:4],
x = c(1.2, 1.7, 4.3, 0.9),
y = c(2.1, 2.2, 0.9, 3.8)) |>
st_as_sf(coords = c("x", "y"))
plot(s, axes = TRUE, text_values = TRUE, reset = FALSE, col = sf.colors())
plot(x, add = TRUE, cex = 4, col = "black")
text(st_coordinates(x), labels = x$id)

mutate(x, index = cell_index(s,x), .after = 1)
#> Simple feature collection with 4 features and 2 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 0.9 ymin: 0.9 xmax: 4.3 ymax: 3.8
#> CRS: NA
#> # A tibble: 4 × 3
#> id index geometry
#> * <chr> <int> <POINT>
#> 1 A 12 (1.2 2.1)
#> 2 B 12 (1.7 2.2)
#> 3 C 5 (4.3 0.9)
#> 4 D 16 (0.9 3.8)
Created on 2022-08-26 by the reprex package (v2.0.1)
There's an unexported function that could be used
> stars:::colrow_from_xy(st_coordinates(x), s)
[,1] [,2]
1 2 3
2 2 3
3 5 1
4 1 4
and then combined, e.g. in
> cell_from_xy = function(pts, s) {
cr = stars:::colrow_from_xy(st_coordinates(x), s)
nc = dim(s)[1]
(cr[,2] - 1) * nc + cr[,1]
}
> cell_from_xy(pts, s)
1 2 3 4
12 12 5 16
This way, it also works for rectilinear and sheared/rotated grids. Shall I add this & call it st_cell_from_points, or implement a _from_xy that takes a matrix?
Great reprex, btw @btupper !
Oh, sweet! Your question about what to implement gets at an issue that we have been tripping over the last umpteen years. We help many folks (ecologists mostly) who start with a spreadsheet of observations that they want to marry with well known environmental covariates (satellite data, weather/climate models, ocean circulation models, etc). They spend a lot of effort trying to negotiate their locational data into various formats needed to extract data. We (helpers) keep thinking there must be a standard coin-of-the-realm that can be used for these extraction-type transactions. We have been pushing sf - "transform your observations into sf and then we'll help you get your covariates". That's what we tried to push at the recent Ocean Hack Week ala xyzt, ghrsst, obpg, hycom and others. Despite not getting much momentum at OHW we still think it is worthwhile. Well, that's a long winded way of saying I would vote for st_cells(stars, sf) exposed for the general user armed with points/lines/etc and _from_xy(stars, matrix) under the hood available for coders. But wishes and fishes and all that... I'd be grateful for either!
gives us
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
library(tibble)
nr = 5
nc = 4
n = nc*nr
m = matrix(seq_len(n), nrow = nr, ncol = nc)
dim(m) = c(x = nr, y = nc)
s = st_as_stars(m)
x = tibble(id = LETTERS[1:4],
x = c(1.2, 1.7, 4.3, 0.9),
y = c(2.1, 2.2, 0.9, 3.8)) |>
st_as_sf(coords = c("x", "y"))
st_cells(s, x)
# 1 2 3 4
# 12 12 5 16
Yay! Thank you!