causal-inference-in-R 16.01: The Parametric G-Formula

16.01: The Parametric G-Formula

Open malcolmbarrett opened this issue 3 years ago • 2 comments

Aug 10 '22 16:08 malcolmbarrett

library(tidyverse)

n <- 1000
d <- tibble(
  c = rnorm(n),
  x = as.numeric(c + rnorm(n) > 0),
  y = x + c + rnorm(n)
)

mod <- lm(y ~ x + c + x*c, data = d)
coef(mod)

new_data1 <- tibble(
  c = d$c,
  x = 1
)
new_data0 <- tibble(
  c = d$c,
  x = 0
)

mean(predict(mod, new_data1) - predict(mod, new_data0))

m1 <- lm(y ~ c, data = d[d$x == 1,])
m0 <- lm(y ~ c, data = d[d$x == 0,])

mean(predict(m1, d) - predict(m0, d))

Big ideas:

If your outcome model is a linear model, and you have no interaction between the treatment and any confounders, the g computation method will give the exact beta coefficient from the conditional model
In the presence of an interaction, the g-computation gives a weighted average based on the distribution of the confounder in the target population
The interaction model if you have interactions between all confounders is equivalent to fitting two separate models, one among the exposed and one among the unexposed and taking the difference in their predictions

Nov 18 '22 16:11 LucyMcGowan

Another simulation based on the above:

n <- 10000
library(tidyverse)
c <- rbinom(n, 1, 0.4)
x <- rbinom(n, 1, ifelse(c == 1, 0.5, 0.2))
y <- x + c + 0 * x * c + rnorm(n)

data <- tibble(
  x, 
  y, 
  c
)
mod <- lm(y ~ x + c, data)
mod |>
  broom::tidy()

data_1 <- data |>
  mutate(x = 1)
data_0 <- data |>
  mutate(x = 0)



data_1$p <- predict(mod, newdata = data_1)
data_0$p <- predict(mod, newdata = data_0)


mean(data_1$p) - mean(data_0$p)

Jul 21 '23 14:07 malcolmbarrett

causal-inference-in-R causal-inference-in-R copied to clipboard

16.01: The Parametric G-Formula

causal-inference-in-R
causal-inference-in-R copied to clipboard