StructuralEquationModels.jl icon indicating copy to clipboard operation
StructuralEquationModels.jl copied to clipboard

Refactor SEM Specification

Open alyst opened this issue 1 year ago • 4 comments

This is a part of #193 that mostly refactors the code for SEM specification types (RAMMatrices, ParameterTable, StenoGraph and EnsembleParameterTable):

  • introduces an abstract SemSpecification type, which is a parent to all the classes that specify SEM
  • make SEM constructors use Julia type dispatch (which constructor gets called depend on the type of the positional arguments; keyword arguments types are not used by the type-based dispatch) -- i.e. for StenoGraph-based constructor the graph kw argument made positional.
  • replaced no-op constructors (e.g. RAMMatrices(a::RAMMatrices) = a) with convert(RAMMatrices, a::RAMMatrices) = a -- that's how Base.convert() is designed to work in Julia.
  • in ParameterTable: switched from partable.variables[:observed_vars] to partable.observed_vars
  • renamed parameter_type to relation in the ParameterTable columns (part of #199)
  • renamed identifier to param in the ParameterTable columns (part of #199)
  • replaced identifier(), get_par_npar_identifier() methods with params() call which returns a vector of SEM parameter IDs (part of #199)
  • renamed n_par() into nparams() (part of #199)
  • added params field to ParameterTable to explicitly define the order of parameters -- simplifies the logic of SEM ensembles and checking/updating parameters
  • added code for checking the correctness of parameter and variables (dimension matching, uniqueness) at the time of SEM construction
  • param_values/lavaan_param_values() -- moved the code for extracting the parameter values from the parameter tables from tests into the SEM API.
  • renamed Base.sort!(::ParameterTable) to sort_vars!() as it was not clear what this function sorts

alyst avatar May 03 '24 15:05 alyst

Codecov Report

Attention: Patch coverage is 75.05721% with 109 lines in your changes are missing coverage. Please review.

Project coverage is 69.64%. Comparing base (c1e7a69) to head (b2012b0). Report is 1 commits behind head on devel.

Files Patch % Lines
src/frontend/fit/summary.jl 0.00% 39 Missing :warning:
src/frontend/specification/ParameterTable.jl 82.50% 21 Missing :warning:
...c/frontend/specification/EnsembleParameterTable.jl 63.33% 11 Missing :warning:
src/frontend/specification/RAMMatrices.jl 89.89% 10 Missing :warning:
src/objective_gradient_hessian.jl 72.72% 9 Missing :warning:
src/types.jl 73.33% 4 Missing :warning:
...c/additional_functions/start_val/start_partable.jl 0.00% 3 Missing :warning:
src/frontend/specification/StenoGraphs.jl 91.66% 3 Missing :warning:
src/additional_functions/helper.jl 81.81% 2 Missing :warning:
src/loss/ML/FIML.jl 77.77% 2 Missing :warning:
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #203      +/-   ##
==========================================
- Coverage   69.66%   69.64%   -0.03%     
==========================================
  Files          52       51       -1     
  Lines        2423     2421       -2     
==========================================
- Hits         1688     1686       -2     
  Misses        735      735              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 04 '24 02:05 codecov[bot]

It looks like SymbolicsUtils 1.6 broke sparsehessian, so I downgraded it to 1.5. Also, allowed StenoGraphs 0.3.

alyst avatar May 11 '24 20:05 alyst

I prefer parameters. But its more like a 60/40 preference.

aaronpeikert avatar May 17 '24 14:05 aaronpeikert

@Maximilian-Stefan-Ernst Thank you for the review, I'll address your comments in a next few days.

Regarding the params vs parameters. I prefer params, but I can change it to parameters if the stakeholders decide so :) Just to summarize my approach:

  • the core idea is to have sort-of "Huffman coding" for the terms in the API: frequently used concepts should use shorter, unambiguous, intuitive terms ("vars", "params", "cov"); less frequently used terms, or the ones that don't have an intuitive short version, could be longer (objective, gradient, hessian); also object type names could be longer because they are less frequently used in the code. I think it makes the code more readable -- code lines are shorter and faster to read; the urge to introduce local variables just for the sake of avoiding using full names (e.g. npar = nparameters(model)) is more subdued :)
  • that is in line with Julia naming conventions (struct, Dict, Ref, Val, eval, undef, Stats, sortperm, cov), there are precedents in the other popular packages, e.g. HTTP.getparam()
  • it has to be consistent with the other terms: if it's decided on parameters, it probably also means switching to variables, covariation etc

alyst avatar May 17 '24 18:05 alyst

Docs are fixed.

Maximilian-Stefan-Ernst avatar May 19 '24 22:05 Maximilian-Stefan-Ernst

W.r.t. the params vs parameters debate - let's settle on params then, I am also more like a 60/40 for parameters, but since you have a clear preference for params we can settle on it by weighted majority vote^^.

Fun fact: I tried to see what Julia Base is using; params matches 19 files, parameters 42 - so Base settled on using both^^.

Maximilian-Stefan-Ernst avatar May 25 '24 11:05 Maximilian-Stefan-Ernst

@Maximilian-Stefan-Ernst Thanks for your review and fixes! -- I've squashed them into original commits to make the history cleaner. Special thanks for settling on params -- I guess I was 80/20 for it :) But if your preference for parameters() would grow stronger, you can always change. Let me know if anything else is left for merging this PR.

Note regarding the doc fixes and other API tweaks (e.g. params_indices()) -- as there are other potential changes from #193 (and I have other ones recently) you may want to incorporate, I think it would be better to update the documentation after all of that potential updates are reviewed. I am a bit busy with my work-related projects, but I'll try to branch the next round of updates from #193 once this PR is merged. The next things should be more material :) (ParamsArray for RAM matrix simplification, evaluate!() interface for unified objective/gradient/hessian evaluation)

alyst avatar May 27 '24 06:05 alyst

Nice, thanks again for contributing all those changes :) Before merging, there are only 2 open comments/suggestions remaining. I think we should then merge this PR and the following ones into devel - after all the changes are merged and we want to make a new release, we can then merge devel into main.

Maximilian-Stefan-Ernst avatar May 27 '24 07:05 Maximilian-Stefan-Ernst

Before merging, there are only 2 open comments/suggestions remaining.

Oh, right, I overlooked those. I've incorporated the iteration cleanup; as for pretty printing tweak -- see my comment.

alyst avatar May 27 '24 08:05 alyst

I've rebased the branch onto devel, the conflicts are resolved.

alyst avatar May 27 '24 18:05 alyst

@Maximilian-Stefan-Ernst I've revised your param_indices() addition. param_indices(semobj) is a useful one-liner method, which I actually wanted to add myself. However, I removed param_indices(semobj, params) before it becomes available:

  • it is potentially confusing that the return type of methods with the same name is different
  • internally, param_indices(semobj, params) constructs the parameters dictionary at each call, which may get misused in certain contexts for large models (e.g. getting parameter indices in a loop). I think it is just a tiny bit inconvenient if instead the user has to call param_indices(semobj) and then use it as a dictionary, but it is a bit less likely to be misused.

Eventually, SEM specification objects could cache param_indices map, and the removed method could be readded. OTOH I find using param_indices(sem) Dict directly is quite straightforward.

alyst avatar May 27 '24 22:05 alyst

@Maximilian-Stefan-Ernst Let me know if there's anything else to tweak for this PR. Otherwise I am ready to rebase #193 once this one gets merged and extract the next round of changes :)

alyst avatar May 30 '24 18:05 alyst

Sorry, just too much other stuff to do :)

Maximilian-Stefan-Ernst avatar May 31 '24 08:05 Maximilian-Stefan-Ernst