tmle3
tmle3 copied to clipboard
CV-TMLE vs TMLE
Hello,
I am following the tutorial and trying to look at the difference between CV-TMLE and TMLE with the perinatal dataset.
To keep things simple I only use a glm as the model for both the propensity score and the outcome mean. I am surprised to see that the output is exactly the same for both procedures. The CV-TMLE seems to complain about glm not being "CV-aware" which might be the reason. However I don't understand why that should be the case. My understanding of CV-TMLE is that:
- The dataset should be splitted in V folds
- The glm models (for both A and Y) should be fitted on each split, so we should have V instantiations of each glm each trained on a different split.
- The targeting step is pooled from predictions of the V glm model pairs on their respective validation sets
- The final estimate is the average of estimates across validation folds
- The influence curve (I am not entirely sure if it is pooled across validation samples or if multiple variance estimates are made and averaged)
As I understand it, we could have used a Super Learning instead of a GLM which would have resulted in another nested cross-validation procedure but Super Learning is not a requirement of CV-TMLE. The code to reproduce is below: you can tweak the learner_list
to change to a super learner and then 2 different outputs are returned and no "CV-aware" complaint is formulated.
I would appreciate some clarification on the procedure and why this is happening! Thanks!
library(data.table)
library(tmle3)
library(sl3)
data = read.csv("perinatal.csv")
node_list <- list(
W = c(
"apgar1", "apgar5", "gagebrth", "mage", "meducyrs", "sexn"
),
A = "parity01",
Y = "haz01"
)
glm = Lrnr_glm$new()
lrn_mean = Lrnr_mean$new()
sl <- Lrnr_sl$new(learners = Stack$new(glm, lrn_mean), metalearner = Lrnr_nnls$new())
learner_list <- list(A = glm, Y = glm)
# learner_list = list(A=sl, Y = sl)
ate_spec <- tmle_ATE(
treatment_level = 1,
control_level = 0
)
tmle_task <- ate_spec$make_tmle_task(data, node_list)
initial_likelihood <- ate_spec$make_initial_likelihood(
tmle_task,
learner_list
)
targeted_likelihood_cv <- Targeted_Likelihood$new(initial_likelihood)
targeted_likelihood_no_cv <-
Targeted_Likelihood$new(initial_likelihood,
updater = list(cvtmle = FALSE)
)
tmle_params_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_cv)
tmle_params_no_cv <- ate_spec$make_params(tmle_task, targeted_likelihood_no_cv)
tmle_no_cv <- fit_tmle3(
tmle_task, targeted_likelihood_no_cv, tmle_params_no_cv,
targeted_likelihood_no_cv$updater
)
tmle_no_cv
# -0.1855909
tmle_cv <- fit_tmle3(
tmle_task, targeted_likelihood_cv, tmle_params_cv,
targeted_likelihood_cv$updater
)
tmle_cv
# -0.1855909