projpred icon indicating copy to clipboard operation
projpred copied to clipboard

Offset issues

Open fweber144 opened this issue 3 years ago • 9 comments

There are two rstanarm issues related to offsets (stan-dev/rstanarm#541 and stan-dev/rstanarm#542) which require a workaround in projpred until they are resolved in rstanarm. I can try to create a PR for these workarounds in projpred, but before doing so, I want to make sure I understand the key idea of handling offsets in projpred: Is it correct that <refmod_object>$ref_predfun() is supposed to not take the offsets into account (or, equivalently, use offsets which are all equal to zero) so that the submodels (which basically fit to <refmod_object>$mu which in turn is based on <refmod_object>$ref_predfun()) may be fitted without any offsets?

fweber144 avatar Aug 09 '21 08:08 fweber144

Submodels can be fitted with offsets if I recall correctly.

Best, Alex


From: Frank Weber @.> Sent: Monday, August 9, 2021 11:12:57 AM To: stan-dev/projpred @.> Cc: Subscribed @.***> Subject: [stan-dev/projpred] Offset issues in rstanarm (#186)

There are two rstanarm issues related to offsets (stan-dev/rstanarm#541https://github.com/stan-dev/rstanarm/issues/541 and stan-dev/rstanarm#542https://github.com/stan-dev/rstanarm/issues/542) which require a workaround in projpred until they are resolved in rstanarm. I can try to create a PR for these workarounds in projpred, but before doing so, I want to make sure I understand the key idea of handling offsets in projpred: Is it correct that <refmod_object>$ref_predfun() is supposed to not take the offsets into account (or, equivalently, use offsets which are all equal to zero) so that the submodels (which basically fit to <refmod_object>$mu which in turn is based on <refmod_object>$ref_predfun()) may be fitted without any offsets?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH6R5WC5RPVJRGSPY2LT36EYTANCNFSM5BZQFNBA.

AlejandroCatalina avatar Aug 09 '21 08:08 AlejandroCatalina

Yes, in principle they should be fittable with offsets (for example, lme4::lmer() accepts offsets via argument offset or via offset() terms in the formula). But the question is if projpred wants them to be fitted with or without offsets. Continuing the lme4::lmer() example: fit_glmer_callback() does not make use of lme4::lmer()'s argument offset. And as far as I can tell, the submodel formulas exclude offset() terms. So at least for GLMMs, the submodels seem to be fitted without offsets. And I'm not 100% sure, but the same seems to hold for GLMs, probably also for GAMs and GAMMs.

Another hint that <refmod_object>$ref_predfun() is supposed to not take the offsets into account is that all <refmod_object>$ref_predfun() calls seem to add the offsets afterwards manually. The only exception seems to be the <refmod_object>$ref_predfun() call in init_refmodel() where <refmod_object>$mu is created. That led me to the assumption that <refmod_object>$mu is supposed to exclude the offsets so that the submodels don't need to be fitted with offsets. Perhaps the reason was that the projpred-internal C++ functions glm_elnet_c() and glm_ridge_c() don't support offsets (as far as I can tell)?

However, there are also hints speaking against the assumption that <refmod_object>$ref_predfun() is supposed to not take the offsets into account: In lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72 offsets are passed to glm_elnet() even though mu is used. And in brms, get_refmodel.brmsfit() uses the default ref_predfun() where posterior_linpred.brmsfit() adds the offsets (at least to my knowledge).

You see, I'm pretty uncertain how projpred wants to handle offsets. That's the reason for this issue here. Perhaps @jpiironen and/or @paul-buerkner can help out?

fweber144 avatar Aug 09 '21 08:08 fweber144

Yes. I see your confusion. We take the offset off the model and keep it in the reference model structure to use it afterwards. In short, we don’t fit the offset in the submodels but use it for predictions, that’s what I meant.

Best, Alex


From: Frank Weber @.> Sent: Monday, August 9, 2021 11:55:47 AM To: stan-dev/projpred @.> Cc: Alejandro Catalina @.>; Comment @.> Subject: Re: [stan-dev/projpred] Offset issues in rstanarm (#186)

Yes, in principle they should be fittable with offsets (for example, lme4::lmer() accepts offsets via argument offset or via offset() terms in the formula). But the question is if projpred wants them to be fitted with or without offsets. Continuing the lme4::lmer() example: fit_glmer_callback() does not make use of lme4::lmer()'s argument offset. And as far as I can tell, the submodel formulas exclude offset() terms. So at least for GLMMs, the submodels seem to be fitted without offsets. And I'm not 100% sure, but the same seems to hold for GLMs, probably also for GAMs and GAMMs.

Another hint that <refmod_object>$ref_predfun() is supposed to not take the offsets into account is that all <refmod_object>$ref_predfun() calls seem to add the offsets afterwards manually. The only exception seems to be the <refmod_object>$ref_predfun() call in init_refmodel() where <refmod_object>$mu is created. That led me to the assumption that <refmod_object>$mu is supposed to exclude the offsets so that the submodels don't need to be fitted with offsets. Perhaps the reason was that the projpred-internal C++ functions glm_elnet_c() and glm_ridge_c() don't support offsets (as far as I can tell)?

However, there are also hints speaking against the assumption that <refmod_object>$ref_predfun() is supposed to not take the offsets into account: In lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72 offsets are passed to glm_elnet() even though mu is used. And in brms, get_refmodel.brmsfit() uses the default ref_predfun() where posterior_linpred.brmsfit() adds the offsets (at least to my knowledge).

You see, I'm pretty uncertain how projpred wants to handle offsets. That's the reason for this issue here. Perhaps @jpiironenhttps://github.com/jpiironen and/or @paul-buerknerhttps://github.com/paul-buerkner can help out?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/186#issuecomment-895056403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH4LYCHWBIOM2GN2OB3T36JZHANCNFSM5BZQFNBA.

AlejandroCatalina avatar Aug 09 '21 09:08 AlejandroCatalina

Ok, so the output of ref_predfun() is supposed to exclude the offsets. In the PR announced above (i.e., the workaround PR for rstanarm issues stan-dev/rstanarm#541 and stan-dev/rstanarm#542), I'll fix this because currently, the output of ref_predfun() includes the offsets (leaving the two rstanarm issues out of consideration).

Just FYI:

  • I could confirm that the output of linear_mle() does not change when using offset() in the formula or not. So GLM submodels are indeed also fitted without offsets (as hypothesized above).
  • I found out that in lines https://github.com/stan-dev/projpred/blob/8655bf129bc6c9aaf99cfd63b6f9869e611624bb/R/search.R#L66-L72 d_train$offset is actually NULL, so offsets are excluded here, too.

fweber144 avatar Aug 16 '21 09:08 fweber144

Fixed in develop now.

fweber144 avatar Oct 11 '21 17:10 fweber144

Couldn't projpred's approach of excluding the offsets for fitting the submodels be a bit dangerous from a numerical perspective? Imagine the case of a binomial response where the linear predictor which excludes the offset yields a probability close to 0 or 1 whereas the linear predictor which includes the offset yields a probability closer to 0.5. Then, from a numerical perspective, it would be preferable to operate with the linear predictor which includes the offset, right?

fweber144 avatar Nov 04 '21 12:11 fweber144

Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu and then fit the submodels without any offset() term or offset argument) I just got aware of: Shouldn't it give invalid results for datafits? Because for datafits, <refmodel>$mu contains the observed response values (so it "includes" offsets by construction and it's not possible to "exclude" them), but the submodels are still fitted without offsets.

EDIT: This has been addressed now in PR #351.

fweber144 avatar Sep 13 '22 18:09 fweber144

Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu and then fit the submodels without any offset() term or offset argument) I just got aware of: The [...$]family$predvar functions (or rather the calls to them) need to take offsets into account, at least in case of a family which has a dispersion parameter and a non-identity link function (see for example the first line of equation (20) of Piironen et al., 2020).

References

Piironen, J., Paasiniemi, M., & Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711

EDIT: This has been addressed now in PR #355.

fweber144 avatar Oct 09 '22 07:10 fweber144

Another problem with projpred's current approach of handling offsets (don't include them in <refmodel>$mu and then fit the submodels without any offset() term or offset argument) I just got aware of: The mu in loo_varsel() sometimes needs to be adapted so that offsets are taken into account, for example in lines https://github.com/stan-dev/projpred/blob/cff1a5a6dd833e711d233ad00857c70cd7a2f7a8/R/cv_varsel.R#L358 (which can be seen from the fact that for the submodels, offsets are taken into account, see lines https://github.com/stan-dev/projpred/blob/cff1a5a6dd833e711d233ad00857c70cd7a2f7a8/R/cv_varsel.R#L434-L436 for example, so not taking offsets into account for the reference model's mu would be inconsistent (and it is even incorrect because get_stat() later compares mu to y and for that, offsets need to be taken into account)).

EDIT: This has been addressed now in PR #356.

fweber144 avatar Oct 09 '22 19:10 fweber144