causal-inference-in-R
causal-inference-in-R copied to clipboard
16.01: The Parametric G-Formula
library(tidyverse)
n <- 1000
d <- tibble(
c = rnorm(n),
x = as.numeric(c + rnorm(n) > 0),
y = x + c + rnorm(n)
)
mod <- lm(y ~ x + c + x*c, data = d)
coef(mod)
new_data1 <- tibble(
c = d$c,
x = 1
)
new_data0 <- tibble(
c = d$c,
x = 0
)
mean(predict(mod, new_data1) - predict(mod, new_data0))
m1 <- lm(y ~ c, data = d[d$x == 1,])
m0 <- lm(y ~ c, data = d[d$x == 0,])
mean(predict(m1, d) - predict(m0, d))
Big ideas:
- If your outcome model is a linear model, and you have no interaction between the treatment and any confounders, the g computation method will give the exact beta coefficient from the conditional model
- In the presence of an interaction, the g-computation gives a weighted average based on the distribution of the confounder in the target population
- The interaction model if you have interactions between all confounders is equivalent to fitting two separate models, one among the exposed and one among the unexposed and taking the difference in their predictions
Another simulation based on the above:
n <- 10000
library(tidyverse)
c <- rbinom(n, 1, 0.4)
x <- rbinom(n, 1, ifelse(c == 1, 0.5, 0.2))
y <- x + c + 0 * x * c + rnorm(n)
data <- tibble(
x,
y,
c
)
mod <- lm(y ~ x + c, data)
mod |>
broom::tidy()
data_1 <- data |>
mutate(x = 1)
data_0 <- data |>
mutate(x = 0)
data_1$p <- predict(mod, newdata = data_1)
data_0$p <- predict(mod, newdata = data_0)
mean(data_1$p) - mean(data_0$p)