bartCause
bartCause copied to clipboard
Confidence interval weirdness and lack of match with tmle using dbart
I am almost always getting non-significant results with bartCause and almost always getting very significant results with the tmle package (when only using dbarts in tmle). I have no idea which package to trust more (although literature says to trust BART more).
Let's look at a simple example; let say we use the example in the documentation of bartCause.
## fit a simple linear model
n <- 100L
beta.z <- c(.75, -0.5, 0.25)
beta.y <- c(.5, 1.0, -1.5)
sigma <- 2
set.seed(725)
x <- matrix(rnorm(3 * n), n, 3)
tau <- rgamma(1L, 0.25 * 16 * rgamma(1L, 1 * 32, 32), 16)
p.score <- pnorm(x %*% beta.z)
z <- rbinom(n, 1, p.score)
mu.0 <- x %*% beta.y
mu.1 <- x %*% beta.y + tau
y <- mu.0 * (1 - z) + mu.1 * z + rnorm(n, 0, sigma)
# bartCause with k=.5 to match TMLE
fit <- bartc(y, z, x, k=.5)
summary(fit)
# TMLE
tmle(y, z, x,g.SL.library = c("tmle.SL.dbarts.k.5"),g.Delta.SL.library = c("tmle.SL.dbarts.k.5"), Q.SL.library = c("tmle.SL.dbarts.k.5"))
bartCause output
Call: bartc(response = y, treatment = z, confounders = x, k = 0.5)
Causal inference model fit by:
model.rsp: bart
model.trt: bart
Treatment effect (pate):
estimate sd ci.lower ci.upper
ate 1.727 1.16 -0.5476 4.001
Estimates fit from 100 total observations
95% credible interval calculated by: normal approximation
population TE approximated by: posterior predictive distribution
Result based on 500 posterior samples times 10 chains
TMLE output
tmle(y, z, x,g.SL.library = c("tmle.SL.dbarts.k.5"),g.Delta.SL.library = c("tmle.SL.dbarts.k.5"), Q.SL.library = c("tmle.SL.dbarts.k.5"))
Additive Effect
Parameter Estimate: 0.84231
Estimated Variance: 0.1095
p-value: 0.010913
95% Conf Interval: (0.19373, 1.4909)
Additive Effect among the Treated
Parameter Estimate: 0.097402
Estimated Variance: 0.14095
p-value: 0.79529
95% Conf Interval: (-0.63844, 0.83324)
Additive Effect among the Controls
Parameter Estimate: 0.82298
Estimated Variance: 0.17686
p-value: 0.050355
95% Conf Interval: (-0.0012894, 1.6473)
As you can see, what should be about the same approach (since both of them use dbarts with the same settings), give effect=1.7 with CI (-.5,4) for bartCause and effect=.84 with CI (.19,1.49).
-
Which method is more right? It seems like a big difference in estimate and CI. Is the causal effect of z zero in this case (since z seems to be caused by x)?
-
Do you have any idea why the results can be so different even if both packages use dbart?
After re-running the example with multiple seeds, I see that BART consistently has CI that does not include 0, but TMLE sometimes get significant negative, sometime significant positive, and sometimes non-significant results. So it really seems that TMLE works very poorly even while using dbart. I am not sure why, but I guess this answers my question about which method to trust more. Maybe you should use that example as a strong argument for your method.
I'm not an expert in TMLE so I can't really speak to that, but I have noticed that the adjustments makes a pretty large difference in small sample sizes.
Additionally, after digging into the TMLE package I'm noticing a big difference in how the response is estimated even before the TMLE adjustment is done. Apparently, the use of SuperLearner is the issue, as it is not in the function tmle:::tmle.SL.dbarts.k.5
. Not that training MSE is anything to go by, but the result from the SL call on the (scaled data) inside tmle:::estimateQ
is 0.08, while for tmle.SL.dbarts.k.5
it is 0.003. A simple lm
gets 0.02, so something strange seems to be happening.
Thanks for looking into the example!