pymc-examples icon indicating copy to clipboard operation
pymc-examples copied to clipboard

Confirmatory Factor Analysis and Structural Equation Models

Open NathanielF opened this issue 6 months ago • 2 comments

Notebook proposal

Title: Confirmatory Factor Analysis and Structural Equation Models

Why should this notebook be added to pymc-examples?

This fills a gap in the coverage we have of CFA and SEM models highlighting in particular their role in the analysis of psychometric survey data. It's super interesting and tied to Judea Pearl style causal inference on DAGs.

image

DATA

image

Basic CFA example in PyMC

coords = {'obs': list(range(len(df_p))), 
          'indicators': ['PI', 'AD',    'IGC', 'FI', 'FC'],
          'indicators_1': ['PI', 'AD',  'IGC'],
          'indicators_2': ['FI', 'FC'],
          'latent': ['Student', 'Faculty']
          }


obs_idx = list(range(len(df_p)))
with pm.Model(coords=coords) as model:
  
  Psi = pm.InverseGamma('Psi', 5, 10, dims='indicators')
  lambdas_ = pm.Normal('lambdas_1', 1, 10, dims=('indicators_1'))
  lambdas_1 = pm.Deterministic('lambdas1', pt.set_subtensor(lambdas_[0], 1), dims=('indicators_1'))
  lambdas_ = pm.Normal('lambdas_2', 1, 10, dims=('indicators_2'))
  lambdas_2 = pm.Deterministic('lambdas2', pt.set_subtensor(lambdas_[0], 1), dims=('indicators_2'))
  tau = pm.Normal('tau', 3, 10, dims='indicators')
  kappa = 0
  sd_dist = pm.Exponential.dist(1.0, shape=2)
  chol, _, _ = pm.LKJCholeskyCov('chol_cov', n=2, eta=2,
    sd_dist=sd_dist, compute_corr=True)
  ksi = pm.MvNormal('ksi', kappa, chol=chol, dims=('obs', 'latent'))

  m1 = tau[0] + ksi[obs_idx, 0]*lambdas_1[0]
  m2 = tau[1] + ksi[obs_idx, 0]*lambdas_1[1]
  m3 = tau[2] + ksi[obs_idx, 0]*lambdas_1[2]
  m4 = tau[3] + ksi[obs_idx, 1]*lambdas_2[0]
  m5 = tau[4] + ksi[obs_idx, 1]*lambdas_2[1]
  
  mu = pm.Deterministic('mu', pm.math.stack([m1, m2, m3, m4, m5]).T)
  _  = pm.Normal('likelihood', mu, Psi, observed=df_p.values)

  idata = pm.sample(nuts_sampler='numpyro', target_accept=.95, 
                    idata_kwargs={"log_likelihood": True})
  idata.extend(pm.sample_posterior_predictive(idata))

Suggested categories:

  • Level: Intermediate.

Related notebooks

Perhaps this one: https://www.pymc.io/projects/examples/en/latest/case_studies/factor_analysis.html But it seems to be recount factor analysis more as a machine learning feature reduction technique than as a means of analysis as per the psychometrics use-case.

References

Will likely adapt (WIP) a blog post i'm working on here: https://nathanielf.github.io/posts/post-with-code/CFA_AND_SEM/CFA_AND_SEM.html

The original work references the book Bayesian Psychometric Modeling by Mislevey and Levy

NathanielF avatar Aug 19 '24 08:08 NathanielF