tidyLPA icon indicating copy to clipboard operation
tidyLPA copied to clipboard

How to properly implement Lo-Mendell-Rubin LRT using calc_lrt

Open datasci-iopsy opened this issue 3 years ago • 5 comments

Hello,

Thanks for all of the amazing work you have done with this package! The wrappers are a fantastic way to parameterize and compare LPA models; though, I need some clarification regarding how to utilize the Lo-Mendell-Rubin LRT test using the calc_lrt function from your package.

The function's description states that calc_lrt

Implements the ad-hoc adjusted likelihood ratio test (LRT) described in Formula 15 of Lo, Mendell, & Rubin (2001), or LMR LRT.

and uses the following arguments:

Arguments Details
n Integer. Sample size
null_ll Numeric. Log-likelihood of the null model.
null_param Integer. Number of parameters of the null model.
null_classes Integer. Number of classes of the null model.
alt_ll Numeric. Log-likelihood of the alternative model.
alt_param Integer. Number of parameters of the alternative model.
alt_classes Integer. Number of classes of the alternative model.

Suppose I run two and three profiles using two different covariance structures (EEI & VVV) and I want to compare all the models. I would run the following code and save the output:

.df %>%
    ## select all columns of interest
    select(everything()) %>% 
    ## run tidyLPA 
    estimate_profiles(n_profiles = 2:3, models = c(1, 6)) -> 
    lpa_res

class(lpa_res) 
"tidyProfile.mclust" "tidyProfile"        "list"  

From here it's easy to run some of the additional methods and functions, such as compare_solutions(), plot_profiles(), or get all the relevant fit statistics using get_fit(). What I am having difficulty understanding is exactly what metrics I need to pull from the models to run the calc_lrt functions properly

I started by trying to extract each individual argument from the models explicitly, but I am confused as to what "number of parameters of null model" and "number of parameters for alt model" should really be:

## NULL Model
lpa_res$model_1_class_2$model$n #sample size
lpa_res$model_1_class_2$model$loglik #loglik of null model
## ?? lpa_res$model_1_class_2$model$parameters$ #num of parameters of null model??
lpa_res$model_1_class_2$model$G #number of classes of null model

##  ALT Model
## lpa_res$model_6_class_2$model$n #sample size - not needed again, technically
lpa_res$model_6_class_2$model$loglik #loglik of alt model
## ?? lpa_res$model_6_class_2$model$parameters$ #num of parameters of alt model??
lpa_res$model_6_class_2$model$G #number of classes of alt model

I am able to extract the degrees of freedom (lpa_res$model_6_class_2$model$df) from the models and my first instinct was to simply subtract the sample size from the degrees of freedom to get the parameters argument but I'm uncertain if this is correct. Could you provide exactly what I should be used for the parameter object by noting what should be extracted from the output.

Thanks in advance!

Cheers,

datasci-iopsy avatar Apr 02 '21 17:04 datasci-iopsy

This function was included as a favor to a collaborator; I hesitate to advise on its use because as far as I'm concerned the only appropriate answer is "dont use the LRT, use the BLRT which is provided by default in the output of estimate_profiles".

The LRT is really an inappropriate statistic as its assumptions are not met in the context of mixture models.

cjvanlissa avatar Apr 02 '21 17:04 cjvanlissa

This function was included as a favor to a collaborator; I hesitate to advise on its use because as far as I'm concerned the only appropriate answer is "dont use the LRT, use the BLRT which is provided by default in the output of estimate_profiles".

The LRT is really an inappropriate statistic as its assumptions are not met in the context of mixture models.

I appreciate the quick response and feedback, @cjvanlissa ! I've been trying to force the use of this test even though I'm very much a fan of the BLRT and the associated p-values. I'll be sure to note it in my code and upcoming manuscript as I revamp my citations.

I'll review the bootstrap likelihood ratio test information, but when comparing a lot of different models, the BLRT p values is indicating what exactly? Is it comparing the model of interest against the null model of "less than k mixtures?

datasci-iopsy avatar Apr 02 '21 17:04 datasci-iopsy

BLRT and LRT are the same, except that the BLRT is drawn from a bootstrapped likelihood distribution! In the table of estimate_profiles, it compares each model with K mixtures against the model of K-1 mixtures.

cjvanlissa avatar Apr 02 '21 17:04 cjvanlissa

Please let me add a note on the usefulness of the LRT that is implemented in Mplus, because it does not assume a regular Chi square-distribution and there is evidence that it works well (the VLMR-version is my go-to in combination with information indices): The BLRT has a strong tendency to overfit in some data scenarios, probably because it is very sensitive towards mild violations of model assumptions (Peugh & Fan, 2013); in my experience this happens less often with the non-bootstrapped versions. The non-bootstrapped VLMR-LRT appears to work better (for an overview, see Peugh & Fan, 2013). The two non-bootstrapped LRT-versions (LMR and the slightly better VLMR) implemented in Mplus do not assume a Chi square-distribution for the difference test but instead an adjusted asymptotic distribution: Lo, Y., Mendell, N. R., & Rubin, D. B. (2001). Testing the number of components in a normal mixture. Biometrika, 88(3), 767-778. Peugh, J., & Fan, X. (2013). Modeling unobserved heterogeneity using latent profile analysis: A Monte Carlo simulation. Structural Equation Modeling: A Multidisciplinary Journal, 20(4), 616-639.

peter1328 avatar Oct 26 '21 06:10 peter1328

Thank you @peter1328 , do you want to add this to the documentation?

cjvanlissa avatar Oct 26 '21 06:10 cjvanlissa