tinyplot icon indicating copy to clipboard operation
tinyplot copied to clipboard

Nested grouping

Open grantmcdermott opened this issue 2 years ago • 2 comments

It would be nice if we could support nested grouping. (Or, put differently, allow colours to vary/repeat across units.) This would mostly be useful for line plots where we want to avoid joining the end of one line with the start of another. The idea is similar to how ggplot2 allows you to specific aes(col = var1, group = var2) separately.

Here is an illustration using the following dataset. The setting is a difference-in-differences research design with staggered treatment. So we have treatment cohorts (first_treat) superimposed on individual units (id).

  1. First, points. (Fine.)
plot2(y ~ time | first_treat, dat)

  1. Second, lines. (Not fine, because we have lines rejoining across units in the same cohort.)
plot2(y ~ time | first_treat, dat, type = "l")

Of course, we could group (colour) by the individual IDs. This stops the rejoining, but means that we lose the colouring by treatment group (which is the interesting thing from a causal inference perspective).

plot2(y ~ time | id, dat, type = "l", legend = FALSE)

I don't have a solution right now, but it probably requires a new argument like bycol. On the formula side, we could potentially represent this via a / nesting interaction. So the call would become plot2(y ~ time | first_treat / id, dat, type = "l"), i.e. units are nested within first treatment cohorts.

grantmcdermott avatar Jun 28 '23 18:06 grantmcdermott

Ran into this again recently and am now thinking a simpler solution is just to support passing a variable to col. It should be pretty simple to grab the corresponding colour breaks and pass them to our group-split data, by using something like tapply(factor(col_var), by_var, FUN = [[, 1) internally.

Manual proof of concept:

library(tinyplot)

set.seed(123456L)

# 60 time periods, 30 individuals, and 5 waves of treatment
tmax = 60
imax = 30
nlvls = 5

dat = 
  expand.grid(time = 1:tmax, id = 1:imax) |>
  within({
    
    cohort      = NA
    effect      = NA
    first_treat = NA
    
    for (chrt in 1:imax) {
      cohort = ifelse(id==chrt, sample.int(nlvls, 1), cohort)
    }
    
    for (lvls in 1:nlvls) {
      effect      = ifelse(cohort==lvls, sample(2:10, 1), effect)
      first_treat = ifelse(cohort==lvls, sample(1:(tmax+20), 1), first_treat)
    }
    
    first_treat = ifelse(first_treat>tmax, Inf, first_treat)
    treat       = time>=first_treat
    rel_time    = time - first_treat
    y           = id + time + ifelse(treat, effect*rel_time, 0) + rnorm(imax*tmax)
    
    rm(chrt, lvls, cohort, effect)
  })

cols = with(dat, tapply(factor(first_treat), id, FUN = `[[`, 1))  # grab group colours
cols
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
#>  1  3  3  5  4  1  3  2  2  4  2  1  3  4  5  3  4  2  1  2  3  4  3  2  3  5 
#> 27 28 29 30 
#>  5  3  1  3

plt(y ~ time | id, dat, type = "l", col = palette()[cols], legend = FALSE)
#> Warning in tinyplot.default(x = x, y = y, by = by, facet = facet, facet.args = facet.args, : 
#> Continuous legends not supported for this plot type. Reverting to discrete legend.

Created on 2024-08-30 with reprex v2.1.1

TBD on how to handle legends, as well as NSE vs formula arguments.

grantmcdermott avatar Aug 30 '24 19:08 grantmcdermott