causal-inference-in-R icon indicating copy to clipboard operation
causal-inference-in-R copied to clipboard

16.01: The Parametric G-Formula

Open malcolmbarrett opened this issue 3 years ago • 2 comments

malcolmbarrett avatar Aug 10 '22 16:08 malcolmbarrett

library(tidyverse)

n <- 1000
d <- tibble(
  c = rnorm(n),
  x = as.numeric(c + rnorm(n) > 0),
  y = x + c + rnorm(n)
)

mod <- lm(y ~ x + c + x*c, data = d)
coef(mod)

new_data1 <- tibble(
  c = d$c,
  x = 1
)
new_data0 <- tibble(
  c = d$c,
  x = 0
)

mean(predict(mod, new_data1) - predict(mod, new_data0))

m1 <- lm(y ~ c, data = d[d$x == 1,])
m0 <- lm(y ~ c, data = d[d$x == 0,])

mean(predict(m1, d) - predict(m0, d))

Big ideas:

  • If your outcome model is a linear model, and you have no interaction between the treatment and any confounders, the g computation method will give the exact beta coefficient from the conditional model
  • In the presence of an interaction, the g-computation gives a weighted average based on the distribution of the confounder in the target population
  • The interaction model if you have interactions between all confounders is equivalent to fitting two separate models, one among the exposed and one among the unexposed and taking the difference in their predictions

LucyMcGowan avatar Nov 18 '22 16:11 LucyMcGowan

Another simulation based on the above:

n <- 10000
library(tidyverse)
c <- rbinom(n, 1, 0.4)
x <- rbinom(n, 1, ifelse(c == 1, 0.5, 0.2))
y <- x + c + 0 * x * c + rnorm(n)

data <- tibble(
  x, 
  y, 
  c
)
mod <- lm(y ~ x + c, data)
mod |>
  broom::tidy()

data_1 <- data |>
  mutate(x = 1)
data_0 <- data |>
  mutate(x = 0)



data_1$p <- predict(mod, newdata = data_1)
data_0$p <- predict(mod, newdata = data_0)


mean(data_1$p) - mean(data_0$p)

malcolmbarrett avatar Jul 21 '23 14:07 malcolmbarrett