tidyLPA
tidyLPA copied to clipboard
How to properly implement Lo-Mendell-Rubin LRT using calc_lrt
Hello,
Thanks for all of the amazing work you have done with this package! The wrappers are a fantastic way to parameterize and compare LPA models; though, I need some clarification regarding how to utilize the Lo-Mendell-Rubin LRT test using the calc_lrt
function from your package.
The function's description states that calc_lrt
Implements the ad-hoc adjusted likelihood ratio test (LRT) described in Formula 15 of Lo, Mendell, & Rubin (2001), or LMR LRT.
and uses the following arguments:
Arguments | Details |
---|---|
n | Integer. Sample size |
null_ll | Numeric. Log-likelihood of the null model. |
null_param | Integer. Number of parameters of the null model. |
null_classes | Integer. Number of classes of the null model. |
alt_ll | Numeric. Log-likelihood of the alternative model. |
alt_param | Integer. Number of parameters of the alternative model. |
alt_classes | Integer. Number of classes of the alternative model. |
Suppose I run two and three profiles using two different covariance structures (EEI & VVV) and I want to compare all the models. I would run the following code and save the output:
.df %>%
## select all columns of interest
select(everything()) %>%
## run tidyLPA
estimate_profiles(n_profiles = 2:3, models = c(1, 6)) ->
lpa_res
class(lpa_res)
"tidyProfile.mclust" "tidyProfile" "list"
From here it's easy to run some of the additional methods and functions, such as compare_solutions()
, plot_profiles()
, or get all the relevant fit statistics using get_fit()
. What I am having difficulty understanding is exactly what metrics I need to pull from the models to run the calc_lrt
functions properly
I started by trying to extract each individual argument from the models explicitly, but I am confused as to what "number of parameters of null model" and "number of parameters for alt model" should really be:
## NULL Model
lpa_res$model_1_class_2$model$n #sample size
lpa_res$model_1_class_2$model$loglik #loglik of null model
## ?? lpa_res$model_1_class_2$model$parameters$ #num of parameters of null model??
lpa_res$model_1_class_2$model$G #number of classes of null model
## ALT Model
## lpa_res$model_6_class_2$model$n #sample size - not needed again, technically
lpa_res$model_6_class_2$model$loglik #loglik of alt model
## ?? lpa_res$model_6_class_2$model$parameters$ #num of parameters of alt model??
lpa_res$model_6_class_2$model$G #number of classes of alt model
I am able to extract the degrees of freedom (lpa_res$model_6_class_2$model$df
) from the models and my first instinct was to simply subtract the sample size from the degrees of freedom to get the parameters argument but I'm uncertain if this is correct. Could you provide exactly what I should be used for the parameter object by noting what should be extracted from the output.
Thanks in advance!
Cheers,
This function was included as a favor to a collaborator; I hesitate to advise on its use because as far as I'm concerned the only appropriate answer is "dont use the LRT, use the BLRT which is provided by default in the output of estimate_profiles".
The LRT is really an inappropriate statistic as its assumptions are not met in the context of mixture models.
This function was included as a favor to a collaborator; I hesitate to advise on its use because as far as I'm concerned the only appropriate answer is "dont use the LRT, use the BLRT which is provided by default in the output of estimate_profiles".
The LRT is really an inappropriate statistic as its assumptions are not met in the context of mixture models.
I appreciate the quick response and feedback, @cjvanlissa ! I've been trying to force the use of this test even though I'm very much a fan of the BLRT and the associated p-values. I'll be sure to note it in my code and upcoming manuscript as I revamp my citations.
I'll review the bootstrap likelihood ratio test information, but when comparing a lot of different models, the BLRT p values is indicating what exactly? Is it comparing the model of interest against the null model of "less than k mixtures?
BLRT and LRT are the same, except that the BLRT is drawn from a bootstrapped likelihood distribution! In the table of estimate_profiles, it compares each model with K mixtures against the model of K-1 mixtures.
Please let me add a note on the usefulness of the LRT that is implemented in Mplus, because it does not assume a regular Chi square-distribution and there is evidence that it works well (the VLMR-version is my go-to in combination with information indices): The BLRT has a strong tendency to overfit in some data scenarios, probably because it is very sensitive towards mild violations of model assumptions (Peugh & Fan, 2013); in my experience this happens less often with the non-bootstrapped versions. The non-bootstrapped VLMR-LRT appears to work better (for an overview, see Peugh & Fan, 2013). The two non-bootstrapped LRT-versions (LMR and the slightly better VLMR) implemented in Mplus do not assume a Chi square-distribution for the difference test but instead an adjusted asymptotic distribution: Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. Peugh, J., & Fan, X. (2013). Modeling unobserved heterogeneity using latent profile analysis: A Monte Carlo simulation. Structural Equation Modeling: A Multidisciplinary Journal, 20(4), 616-639.
Thank you @peter1328 , do you want to add this to the documentation?