projpred
projpred copied to clipboard
Offset issues
There are two rstanarm issues related to offsets (stan-dev/rstanarm#541 and stan-dev/rstanarm#542) which require a workaround in projpred until they are resolved in rstanarm. I can try to create a PR for these workarounds in projpred, but before doing so, I want to make sure I understand the key idea of handling offsets in projpred: Is it correct that <refmod_object>$ref_predfun()
is supposed to not take the offsets into account (or, equivalently, use offsets which are all equal to zero) so that the submodels (which basically fit to <refmod_object>$mu
which in turn is based on <refmod_object>$ref_predfun()
) may be fitted without any offsets?
Submodels can be fitted with offsets if I recall correctly.
Best, Alex
From: Frank Weber @.> Sent: Monday, August 9, 2021 11:12:57 AM To: stan-dev/projpred @.> Cc: Subscribed @.***> Subject: [stan-dev/projpred] Offset issues in rstanarm (#186)
There are two rstanarm issues related to offsets (stan-dev/rstanarm#541https://github.com/stan-dev/rstanarm/issues/541 and stan-dev/rstanarm#542https://github.com/stan-dev/rstanarm/issues/542) which require a workaround in projpred until they are resolved in rstanarm. I can try to create a PR for these workarounds in projpred, but before doing so, I want to make sure I understand the key idea of handling offsets in projpred: Is it correct that <refmod_object>$ref_predfun() is supposed to not take the offsets into account (or, equivalently, use offsets which are all equal to zero) so that the submodels (which basically fit to <refmod_object>$mu which in turn is based on <refmod_object>$ref_predfun()) may be fitted without any offsets?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH6R5WC5RPVJRGSPY2LT36EYTANCNFSM5BZQFNBA.
Yes, in principle they should be fittable with offsets (for example, lme4::lmer()
accepts offsets via argument offset
or via offset()
terms in the formula). But the question is if projpred wants them to be fitted with or without offsets. Continuing the lme4::lmer()
example: fit_glmer_callback()
does not make use of lme4::lmer()
's argument offset
. And as far as I can tell, the submodel formulas exclude offset()
terms. So at least for GLMMs, the submodels seem to be fitted without offsets. And I'm not 100% sure, but the same seems to hold for GLMs, probably also for GAMs and GAMMs.
Another hint that <refmod_object>$ref_predfun()
is supposed to not take the offsets into account is that all <refmod_object>$ref_predfun()
calls seem to add the offsets afterwards manually. The only exception seems to be the <refmod_object>$ref_predfun()
call in init_refmodel()
where <refmod_object>$mu
is created. That led me to the assumption that <refmod_object>$mu
is supposed to exclude the offsets so that the submodels don't need to be fitted with offsets. Perhaps the reason was that the projpred-internal C++ functions glm_elnet_c()
and glm_ridge_c()
don't support offsets (as far as I can tell)?
However, there are also hints speaking against the assumption that <refmod_object>$ref_predfun()
is supposed to not take the offsets into account: In lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72 offsets are passed to glm_elnet()
even though mu
is used. And in brms, get_refmodel.brmsfit()
uses the default ref_predfun()
where posterior_linpred.brmsfit()
adds the offsets (at least to my knowledge).
You see, I'm pretty uncertain how projpred wants to handle offsets. That's the reason for this issue here. Perhaps @jpiironen and/or @paul-buerkner can help out?
Yes. I see your confusion. We take the offset off the model and keep it in the reference model structure to use it afterwards. In short, we don’t fit the offset in the submodels but use it for predictions, that’s what I meant.
Best, Alex
From: Frank Weber @.> Sent: Monday, August 9, 2021 11:55:47 AM To: stan-dev/projpred @.> Cc: Alejandro Catalina @.>; Comment @.> Subject: Re: [stan-dev/projpred] Offset issues in rstanarm (#186)
Yes, in principle they should be fittable with offsets (for example, lme4::lmer() accepts offsets via argument offset or via offset() terms in the formula). But the question is if projpred wants them to be fitted with or without offsets. Continuing the lme4::lmer() example: fit_glmer_callback() does not make use of lme4::lmer()'s argument offset. And as far as I can tell, the submodel formulas exclude offset() terms. So at least for GLMMs, the submodels seem to be fitted without offsets. And I'm not 100% sure, but the same seems to hold for GLMs, probably also for GAMs and GAMMs.
Another hint that <refmod_object>$ref_predfun() is supposed to not take the offsets into account is that all <refmod_object>$ref_predfun() calls seem to add the offsets afterwards manually. The only exception seems to be the <refmod_object>$ref_predfun() call in init_refmodel() where <refmod_object>$mu is created. That led me to the assumption that <refmod_object>$mu is supposed to exclude the offsets so that the submodels don't need to be fitted with offsets. Perhaps the reason was that the projpred-internal C++ functions glm_elnet_c() and glm_ridge_c() don't support offsets (as far as I can tell)?
However, there are also hints speaking against the assumption that <refmod_object>$ref_predfun() is supposed to not take the offsets into account: In lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72 offsets are passed to glm_elnet() even though mu is used. And in brms, get_refmodel.brmsfit() uses the default ref_predfun() where posterior_linpred.brmsfit() adds the offsets (at least to my knowledge).
You see, I'm pretty uncertain how projpred wants to handle offsets. That's the reason for this issue here. Perhaps @jpiironenhttps://github.com/jpiironen and/or @paul-buerknerhttps://github.com/paul-buerkner can help out?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/186#issuecomment-895056403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH4LYCHWBIOM2GN2OB3T36JZHANCNFSM5BZQFNBA.
Ok, so the output of ref_predfun()
is supposed to exclude the offsets. In the PR announced above (i.e., the workaround PR for rstanarm issues stan-dev/rstanarm#541 and stan-dev/rstanarm#542), I'll fix this because currently, the output of ref_predfun()
includes the offsets (leaving the two rstanarm issues out of consideration).
Just FYI:
- I could confirm that the output of
linear_mle()
does not change when usingoffset()
in the formula or not. So GLM submodels are indeed also fitted without offsets (as hypothesized above). - I found out that in lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72
d_train$offset
is actuallyNULL
, so offsets are excluded here, too.
Fixed in develop
now.
Couldn't projpred's approach of excluding the offsets for fitting the submodels be a bit dangerous from a numerical perspective? Imagine the case of a binomial response where the linear predictor which excludes the offset yields a probability close to 0 or 1 whereas the linear predictor which includes the offset yields a probability closer to 0.5. Then, from a numerical perspective, it would be preferable to operate with the linear predictor which includes the offset, right?
Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu
and then fit the submodels without any offset()
term or offset
argument) I just got aware of: Shouldn't it give invalid results for datafit
s? Because for datafit
s, <refmodel>$mu
contains the observed response values (so it "includes" offsets by construction and it's not possible to "exclude" them), but the submodels are still fitted without offsets.
EDIT: This has been addressed now in PR #351.
Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu
and then fit the submodels without any offset()
term or offset
argument) I just got aware of: The [...$]family$predvar
functions (or rather the calls to them) need to take offsets into account, at least in case of a family which has a dispersion parameter and a non-identity link function (see for example the first line of equation (20) of Piironen et al., 2020).
References
Piironen, J., Paasiniemi, M., & Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711
EDIT: This has been addressed now in PR #355.
Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu
and then fit the submodels without any offset()
term or offset
argument) I just got aware of: The mu
in loo_varsel()
sometimes needs to be adapted so that offsets are taken into account, for example in lines https://github.com/stan-dev/projpred/blob/cff1a5a6dd833e711d233ad00857c70cd7a2f7a8/R/cv_varsel.R#L358 (which can be seen from the fact that for the submodels, offsets are taken into account, see lines https://github.com/stan-dev/projpred/blob/cff1a5a6dd833e711d233ad00857c70cd7a2f7a8/R/cv_varsel.R#L434-L436 for example, so not taking offsets into account for the reference model's mu
would be inconsistent (and it is even incorrect because get_stat()
later compares mu
to y
and for that, offsets need to be taken into account)).
EDIT: This has been addressed now in PR #356.