causal-inference-in-R icon indicating copy to clipboard operation
causal-inference-in-R copied to clipboard

Non-collapsibility: additive vs multiplicative models

Open malcolmbarrett opened this issue 11 months ago • 5 comments

I think I have a working simulation that shows the contrast in non-collapsibility effects in models that were generated with additive and multiplicative models.

@LucyMcGowan, what do you think?

One thing I find weird but may be doing incorrectly: the LPM works for the additive model but not for the multiplicative model. I would have guessed using the LPM would workish for the multiplicative model because even though additive, it's collapsible. Could it be due to something in the generating process? Is my expectation wrong about what log odds ratio it should produce?

set.seed(12345)
library(tidyverse)
n <- 1000000

multiplicative_data <- tibble(
  x1 = rnorm(n, mean = 0, sd = 1),
  x2 = rnorm(n, mean = 0, sd = 1),
  x3 = rnorm(n, mean = 0, sd = 1)
)

# baseline log-odds
beta0 <- -1
# coef log-odds ratio
beta1 <- 1.1
beta2 <- -1.1
beta3 <- 0.5

# calculate multiplicative probability on logistic scale
prob <- with(
  multiplicative_data, 
  plogis(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3 + rnorm(n, sd = .1))
)

multiplicative_data <- multiplicative_data |>
  mutate(y = rbinom(n, 1, prob))

# answer is wrong without all three
glm(y ~ x1, data = multiplicative_data, family = binomial())$coefficients
#> (Intercept)          x1 
#>  -0.7873634   0.8763435
glm(y ~ x1 + x2 + x3, data = multiplicative_data, family = binomial())$coefficients
#> (Intercept)          x1          x2          x3 
#>  -0.9956394   1.0980438  -1.1027489   0.4968647

# using a linear probability model
coefs <- lm(y ~ x1, data = multiplicative_data)$coefficients
coefs
#> (Intercept)          x1 
#>   0.3368481   0.1696988
# log odds ratio. Also wrong?
log((coefs[2] / (2 - coefs[2])) / (coefs[1] / (1 - coefs[1])))
#>        x1 
#> -1.700838


additive_data <- tibble(
  x1 = rnorm(n, mean = 0, sd = 1),
  x2 = rnorm(n, mean = 0, sd = 1),
  x3 = rnorm(n, mean = 0, sd = 1)
)

# Baseline probability
beta0 <- 0.5
# Coef risk difference
beta1 <- 0.1  
beta2 <- 0.1  
beta3 <- 0.1

# Calculate probabilities using a linear combination of predictors
# Clamp to 0 and 1
prob <- with(additive_data, beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3)
prob <- pmin(pmax(prob, 0), 1)

additive_data <- additive_data |>
  mutate(y = rbinom(n, 1, prob))

# expected log odds ratio of each coef:
# log((.6/(1-.6)) / (.5/(1-.5))) = 0.4055
# answer is wrong with more than one predictor of y
glm(y ~ x1, data = additive_data, family = binomial())$coefficients
#>  (Intercept)           x1 
#> -0.001851411  0.417459645
glm(y ~ x1 + x2 + x3, data = additive_data, family = binomial())$coefficients
#>  (Intercept)           x1           x2           x3 
#> -0.001874116  0.455620087  0.452392691  0.454884135

# correct on the linear, collapsible scale
lm(y ~ x1, data = additive_data)$coefficients
#> (Intercept)          x1 
#>   0.4995522   0.1001733
lm(y ~ x1 + x2 + x3, data = additive_data)$coefficients
#> (Intercept)          x1          x2          x3 
#>  0.49955491  0.10017435  0.09946685  0.09999777

Created on 2024-03-12 with reprex v2.1.0


<sup>Created on 2024-03-12 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>

malcolmbarrett avatar Mar 12 '24 17:03 malcolmbarrett

Note: additive model is collapsible but wrong, e.g. adding the other two will give the same (wrong?) answer. It's because it's the wrong data gen mechanism

malcolmbarrett avatar Mar 12 '24 17:03 malcolmbarrett

log odds calc is wrong for the first one because it doesn't include the baseline probability

malcolmbarrett avatar Mar 12 '24 17:03 malcolmbarrett

just to follow up on your comment, I think this should be:

log(((coefs[1] + coefs[2]) / (1 - (coefs[1] + coefs[2]))) / (coefs[1] / (1 - coefs[1])))

but also this is kind of a weird thing to do here because it will be different for every difference in x1 I think (like here we are saying the difference between x1 = 0 and x1 = 1, but because the odds ratios are not linear if instead you looked at the difference between x1 = 1 and x1 = 2 it would be a different value

LucyMcGowan avatar Mar 12 '24 19:03 LucyMcGowan

I also don't think you want random error here:

  plogis(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3 + rnorm(n, sd = .1))

because the randomness is already baked in by pulling from plogis, and then also by running through rbinom. I think maybe would make more sense to just have the randomness come from the draw from the binomial distribution, so something like this:

prob <- with(
  multiplicative_data, 
  1 / (1 + exp(-(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3))
)

LucyMcGowan avatar Mar 12 '24 19:03 LucyMcGowan

Ah maybe I’m wrong! Is plogis deterministic? This seems to say yes! Edit: oh yeah of course it is 🫣 my poor little brain hahaha. I still think we don’t need rnorm in there since we already do rbinom but plogis is good 😂

https://x.com/noah_greifer/status/1768505576143659132

LucyMcGowan avatar Mar 15 '24 15:03 LucyMcGowan