jasp-issues
jasp-issues copied to clipboard
[Feature Request]: Heckman Logistic Regression
Description
Adding new regression procedures
Purpose
No response
Use-case
Useful in situations with small datasets, in which separation is likely to be a problem
Is your feature request related to a problem?
No response
Describe the solution you would like
The addition of Firth and Heckman logistic regression
Describe alternatives that you have considered
No response
Additional context
Separation tends to be a problem in small samples with several highly predictive predictors, the Firth procedure can help to overcome this:
https://onlinelibrary.wiley.com/doi/10.1002/sim.1047
Heinze and colleagues have applied the Firth correction to Cox and logistic regression: further references and R packages are available here:
https://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/firth-correction/
I am not so familiar with Heckman regression, so I can't provide many details or references. But, colleagues have suggested that it would also be a useful addition to your program. They advise that it can be implemented with this package:
https://www.jstatsoft.org/article/view/v027i07
This would be very helpful. Small sample bias in logistic regression is largely ignored in literature and (thus?) in software.
The Firth correction might be the best known correction and is especially helpful in the case of (quasi-)complete separation. This method can easily be implemented by switching to an alternative optimizer when calling the glm
function somewhere under the hood (I think in jaspRegression
's .glmComputeModel
?). Cool thing: the Firth correction is proportional to the posterior distribution when using the Jeffreys prior in Bayesian stats.
The standard optimizer for the glm
function (through glm.fit
) is iteratively reweighted least squares. For the Firth correction, one could replace this optimization routine with a method that maximizes the penalized log-likelihood
$L^\ast(\beta) = L(\beta) + \frac{1}{2}\mathrm{log}|\mathcal{I}|$,
where $|\mathcal{I}|$ is the determinant of the expected information matrix. This is implemented in the brglm2
package as an optimization procedure that can be used from within stats::glm
. One could, for instance, add Firth correction as a boolean to the options
argument, and dependent on that use either "glm.fit"
(the default) or brglm2::brglmFit
in the method
argument when calling glm
in the body of (I think) .glmComputeModel
.
Something like this (somewhere around line 60 of https://github.com/jasp-stats/jaspRegression/blob/master/R/glmCommonFunctions.R):
optimizer <- switch(options$Firth,
"no" = "glm.fit",
"yes" = brglm2::brglmFit)
# compute full and null models
if (options$weights == "") {
fullModel <- stats::glm(ff, family = familyLink, data = dataset, weights = NULL, method = optimizer)
nullModel <- stats::glm(nf, family = familyLink, data = dataset, weights = NULL, method = optimizer)
} else {
fullModel <- stats::glm(ff, family = familyLink, data = dataset, weights = get(options$weights), method = optimizer)
nullModel <- stats::glm(nf, family = familyLink, data = dataset, weights = get(options$weights), method = optimizer)
}
One possible problem: since we're messing with the log-likelihood function, model comparison (using LRT and perhaps AIC and BIC) might be inappropriate since these require plain maximum likelihood optimization. Implementing such corrections should perhaps void the model comparison statistics in the output.
Firth logistic regression will be available in the next JASP release under the GLM regression analysis.
I am not closing this issue just yet - @fqixiang what do you think about the Heckman regression?
@Kucharssim Heckman-type selection models seem to be used mostly in econometrics (just an impression, since this is not my field of expertise). The implementation seems quite easy (a two-step procedure where selection probabilities are first estimated using a probit model and then added as a covariate to the regression model of interest). This can already be done by in JASP by making use of the GLM module and the linear regression module together. Of course, it will be easier for users if we have this implemented as a single analysis. I don't have a strong opinion about this, though.
@mathijsdeen Thanks for taking the time to write such a wonderful response about Firth logistic regression. Unfortunately, I saw it only just now! I already implemented Firth logistic regression using the logistf
package for the new release of JASP. I wish I had read it much earlier and made use of the brglm2
package, especially considering that it also supports other GLMs like ordinal and multinomial logistic regression, which I also implemented for the new JASP release (using the VGAM
package). brglm2
would have made the code for the different analyses more consistent and simpler. Perhaps it will be a good idea to switch to brglm2
altogether in the future!
@fqixiang Cool, I'm looking forward to the implementation! And of course, implementation before efficiency :).
Summing up: While Firth logistic is available in GLM regression, Heckmann is still needed.