ltmle
ltmle copied to clipboard
remove intercept column if super learning
If SuperLearner
is used to estimate nuisance parameters, ltmle
first creates a design matrix based on Qform
and gform
that is subsequently passed to SuperLearner
via the X
option.
However, the default formulas for Qform
and gform
generate an intercept column in the X
matrix. This causes some algorithms in SuperLearner
to throw unnecessary warnings (e.g., the call to glm
in SL.glm
complains about not having a full rank matrix). Generally, these algorithms will still work, but the ltmle
output will include unnecessary warnings.
This PR fixes this by checking for an intercept term when model.matrix
is called and if there is one, it removes the first column of X
, which is assumed to the be the intercept.
I haven't done extensive testing to know if there is anywhere else in the code where this change is necessary, but a few examples I've run seem to work with warnings.
Commit 41a22a1546427fd6934062502484baa24306f3e6 fixes a bug in how p-values are generated for test of counterfactual means.
Based on the previous code, I'd guess that the goal was to test null hypothesis that E[Y(a)] = 0. However, what was instead being tested when outcomes were transformed was (E[Y(a)] - min(Y)) / (max(Y) - min(Y)) = 0.
I've corrected the summary.ltmleEffectMeasures
function to reflect this.
Commit 83b100bfafb831a31d7e504574bcde5f72170c31 fixes bug in summary.ltmle
induced by 41a22a1546427fd6934062502484baa24306f3e6.
Also adds stability checks to SuperLearner
. Specifically,
- For binary outcomes, if fewer than 10 outcomes, change SuperLearner to
V=2
fold CV and stratify on outcomes. I was running into cases where there were e.g., 9 outcomes and all theSuperLearner
wrappers were complaining about lack of convergence (presumably because all outcomes were 0 in some folds). - If only 1 outcome, change
SL.library
toSL.mean
, since no regression technique can really do anything anyway.