StructuralEquationModels.jl icon indicating copy to clipboard operation
StructuralEquationModels.jl copied to clipboard

Variables/Parameters/Observations terminology & API Cleanup

Open alyst opened this issue 2 months ago • 5 comments

As a part of #193 I already made some changes, so I wanted to get the feedback from maintainers about it. Plus, there are a few other changes in the same direction that I can integrate into #193, so I wanted to mention them here too.

  1. Parameters. Sometimes they are called parameters, sometimes identifiers (in the ParTable). I propose to change it into param (intuitively understandable, but still short):
    • param in the ParTable
    • params() to get the vector of parameters
    • nparams() to get the number of parameters (called n_par() now)
  2. Variables. Sometimes called vars, sometimes colnames, sometimes nodes. Observed variables are sometimes called observed, sometimes manifested. I propose to consolidate into vars (short, but intuitive), which could be observed (more intuitive than manifested) or latent:
    • vars() to get the vector of variables from ParTable, RAMMatrices (matching the order of A columns)
    • nvars() to get the number of variables
    • observed_vars() to get the observed variables matching the order of rows/cols in obs_cov and rows of RAMMatrices.F Alternatively, it could be obs_vars(), which would match obs_cov() and obs_mean() (if observed_vars is chosen, then obs_cov also needs be renamed into observed_cov for consistency).
    • nobserved_vars() to get the number of observed vars (replaces n_man, which in this short form is a little bit confusing).
    • latent_var_indices()/observed_var_indices() to get the indices of vars() that match the observed/latent variables (i-th index of observed_var_indices() is for the i-th variable of observed_vars())
    • latent_vars() is a shortcut to vars()[latent_var_indices()]
    • Also, in case of missing data, I propose to use measured/missing terms (now it uses observed/missing, but observed clashes with observed/latent), and nmeasured_vars()/nmissing_vars() to get their counts
  3. Observations. Also referred to as rows. To disambiguate from observed_vars, I propose to refer to as samples (row is confusing because SEM operates with so many matrices).
    • samples to access to the individual samples (sometimes referred to as rows or rowwise).
    • nsamples() is the number of samples (n_obs() now)
  4. Relations (between the variables, i.e. <- or <->). Now the ParTable have the in param_type column, which is confusing, because sometimes it is constant.

alyst avatar Apr 21 '24 03:04 alyst