sfdep icon indicating copy to clipboard operation
sfdep copied to clipboard

Dealing with NA's on spacetime objects

Open rafalopespx opened this issue 10 months ago • 2 comments

Hi Josiah, thank you for the amazing package! It helped me a lot!

I don't know if this is the right place for this kind of issue, but I think it worth asking.

So, I'm using a lot of function here to work an dataset that might have some time periods of a lot NA on the value variable that I'm interested to produce an Emerging hot spot analysis like here

I've been able to reproduce my error when running this piece of code with random NAs introduced at the value column of the bos spacetime cube.

gi_stars <- bos_nb |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)


gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

and the error as follow:

Error in mutate(): ℹ In argument: gi_star = local_gstar_perm(value, nb, wt). ℹ In group 1: time_period = 2010-01-01. Caused by error in card(): ! not a neighbours list Run rlang::last_trace() to see where the error occurred.

I think the issue here is that as I have NA at the value column this might have introduced some neighbours that aren't any more neighbouring other places, but maybe I'm wrong. I tried alternatives and workaorunds like filtering the bos spacetime before giving to the Gi* pipe and using queen = T and/or allow_zero = T, at st_conguity and st_weights, respectively, But none of this haven't worked, the error returned is the same.

Thanks in advance for any hlep or insights Josiah!

Here I give you my whole pipeline adapated from the vignette:

### Create objects to store file paths 
df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

### read in data 
df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |> 
  mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period), 
                            .region_id = unique(geo$.region_id))

bos <- bos_expanded |> 
  left_join(df)

### Create spacetime object called `bos`
bos <- spacetime(bos, geo, 
                 .loc_col = ".region_id",
                 .time_col = "time_period")

bos

bos_nb <- bos |> 
  activate("geometry") |> 
  mutate(
    nb = include_self(st_contiguity(geometry, queen = T)),
    wt = st_weights(nb, allow_zero = T, zero.policy = T)
  ) |> 
  set_nbs("nb") |> 
  set_wts("wt")

head(bos_nb)

gi_stars <- bos_nb |> 
  filter(!is.na(value)) |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)

gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

### conduct EHSA
ehsa <- emerging_hotspot_analysis(
  x = bos, 
  .var = "value", 
  k = 1, 
  nsim = 99
)

### preview some values 
head(ehsa)

### evaluate the classifications of our hotspots
count(ehsa, classification)

rafalopespx avatar Apr 21 '24 23:04 rafalopespx

Thanks, @rafalopespx! In the future, would you mind wrapping the R code in backticks like this? It would make this a lot more easy to read!

```r
# R code here
```

First and foremost, you cannot have NA values when calculating the local Gi. If you want to use emerging_hotspot_analysis() you will need to impute those values. Or if you want to do something that is not exactly emerging hotspot analysis but quite close (like the vignette) you will need to filter those locations out, then calculate the neighbors based on each time period like so:

library(sfdep)
library(dplyr)

df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |>
mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period),
.region_id = unique(geo$.region_id))

bos <- bos_expanded |>
  left_join(df)

bos |> 
  filter(!is.na(value)) |> 
  left_join(geo) |> 
  group_by(time_period) |> 
  mutate(
    nb = list(st_contiguity(geometry)),
    wt = st_weights(nb[[1]], allow_zero = TRUE),
    g = local_g_perm(value, nb[[1]], wt)
  ) |> 
    arrange(time_period)-> tmp

tmp |> 
  tidyr::unnest(g)

JosiahParry avatar Apr 22 '24 12:04 JosiahParry

Amazing Josiah! It worked, do you have any tips to speed up the process? My dataset is 7665 polygons by 628 time-periods

Edit: One more thing, do you think which way is better? Imputing values, I would do that imputing 0 to then, as I need to have a sense that the values are not significant or calculate the neighbors based on each for each period?

Edit2: I might realizing this now, I just can't produce an emerging_hotspot_analysis once I got NA on some of the slices of the spacetime cube, right? There is no workaround to this?

rafalopespx avatar Apr 22 '24 14:04 rafalopespx