sf
sf copied to clipboard
st_jitter enhancements
st_jitter is great for various purposes, for example to reveal geometries that cover each. It could be improved in 2 ways.
- The current implementation jitters geometries by drawing from a uniform distribution on a square. I found it useful to also allow drawing from a uniform distribution on a circle, maybe with an additional parameter
distribution = c('uniform_square', 'uniform_circle')
. (see first example below) - The
amount
parameter currently has to be a number, it would be nice to allow vectors to easily allow jittering geometries by different amounts. In some cases it may be useful to jitter points by different amounts, depending how much they cluster. (see second example below)
Jittered points by traditional (square) distribution and proposed alternative circle distribution
data:image/s3,"s3://crabby-images/f3c06/f3c0611c641cdf2339110346fd4005244221ff47" alt="image"
Two separate clusters in the data frame jittered by different amounts, depending on the size of the cluster
data:image/s3,"s3://crabby-images/e80c4/e80c4b25d1321848ea71c92f402969aca8e83736" alt="image"
Happy to set up a pull request if this seems useful. Thinking of extending the signature in the following way:
st_jitter <- function(.data, amount, factor = 0.002, distribution = "uniform_square")
Setting the traditional "uniform_square" distribution as a default for backward compatibility.
I agree with the amount which is not recycled, but propose to add a function argument, e.g. the one that is now now f
in st_jitter
, that would let you customize even further (and e.g. use normal distributions).
Just to clarify, what signature do you propose? Scrap the factor
argument and replace it by f
or distribution
that can be of type character or function?
st_jitter <- function(.data, amount, distribution = "uniform_square")
The current functionality to specify the jitter as a fraction of the bounding box diagonal could be re-established as a built-in choice of distribution='bbox_fraction'
. (And it probably should be modified to incorporate the horizontal scale factor in the case of st_is_longlat
in the computation of the amount
.)
Then there is the question of the signature of the distribution function. I would suggest a function with one argument n
, the number of samples to draw, and output a n
x2
matrix. That would be consistent with for example MASS:mvrnorm
. Having the 'n' argument is not strictly necessary, one could also code this with n=1L
, but there are efficiency gains when drawing for all rows in the input geometry all at once (which also yields performance improvements over the current implementation).
Here is an example of what this would look like:
points <- bind_rows(
tibble(X=rep(1,20),Y=rep(1,20)) %>%
st_as_sf(coords=c("X","Y")) %>%
mutate(jd=0.3),
tibble(X=rep(-1,200),Y=rep(-1,200)) %>%
st_as_sf(coords=c("X","Y")) %>%
mutate(jd=1)
)
bind_rows(
st_jitter(points, amount = 0.5, distribution = 'bbox_fraction') %>% mutate(type="bbox_fraction"),
st_jitter(points, amount = 1) %>% mutate(type="uniform_square"),
st_jitter(points, amount = jd) %>% mutate(type="uniform_square varying amount"),
st_jitter(points, amount = 1, distribution = "uniform_circle") %>% mutate(type="uniform_circle"),
st_jitter(points, amount = jd, distribution = "uniform_circle") %>% mutate(type="uniform_circle varying amount"),
st_jitter(points, distribution = function(n)MASS::mvrnorm(n,c(1,1),Sigma=matrix(c(0.5,0,0,0.5),nrow=2))) %>% mutate(type="gauss")
) %>%
ggplot() +
geom_sf() +
facet_wrap("type")
data:image/s3,"s3://crabby-images/6e2a3/6e2a34c7d2abb204de39f3ee635f799d9e88762d" alt="image"
Never posted before so hopefully this is the right place. Would be very helpful to be able to constrain the jitter of points to a polygon boundary, for instance, jitter points while ensuring they still fall within the appropriate census boundary.
If you are looking to fill a polygon with points you probably want to use st_sample()
instead of st_jitter
. If you are looking for something more complicated like jittering a point using a Gaussian distribution and constraining to the inside of a polygon, then you will probably have to hack it together. One option is to first jitter, then st_filter
. And possibly iterate or oversample and trow out some points if you want the total number of jittered points in the polygon be fixed.
I totally forgot about this st_jitter
features, thanks for the reminder. Will come back to this and add a pull request.