Refactor SEM Specification
This is a part of #193 that mostly refactors the code for SEM specification types (RAMMatrices, ParameterTable, StenoGraph and EnsembleParameterTable):
- introduces an abstract
SemSpecificationtype, which is a parent to all the classes that specify SEM - make SEM constructors use Julia type dispatch (which constructor gets called depend on the type of the positional arguments; keyword arguments types are not used by the type-based dispatch) -- i.e. for StenoGraph-based constructor the
graphkw argument made positional. - replaced no-op constructors (e.g.
RAMMatrices(a::RAMMatrices) = a) withconvert(RAMMatrices, a::RAMMatrices) = a-- that's howBase.convert()is designed to work in Julia. - in
ParameterTable: switched frompartable.variables[:observed_vars]topartable.observed_vars - renamed
parameter_typetorelationin theParameterTablecolumns (part of #199) - renamed
identifiertoparamin theParameterTablecolumns (part of #199) - replaced
identifier(),get_par_npar_identifier()methods withparams()call which returns a vector of SEM parameter IDs (part of #199) - renamed
n_par()intonparams()(part of #199) - added
paramsfield toParameterTableto explicitly define the order of parameters -- simplifies the logic of SEM ensembles and checking/updating parameters - added code for checking the correctness of parameter and variables (dimension matching, uniqueness) at the time of SEM construction
param_values/lavaan_param_values()-- moved the code for extracting the parameter values from the parameter tables from tests into the SEM API.- renamed
Base.sort!(::ParameterTable)tosort_vars!()as it was not clear what this function sorts
Codecov Report
Attention: Patch coverage is 75.05721% with 109 lines in your changes are missing coverage. Please review.
Project coverage is 69.64%. Comparing base (
c1e7a69) to head (b2012b0). Report is 1 commits behind head on devel.
Additional details and impacted files
@@ Coverage Diff @@
## devel #203 +/- ##
==========================================
- Coverage 69.66% 69.64% -0.03%
==========================================
Files 52 51 -1
Lines 2423 2421 -2
==========================================
- Hits 1688 1686 -2
Misses 735 735
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It looks like SymbolicsUtils 1.6 broke sparsehessian, so I downgraded it to 1.5.
Also, allowed StenoGraphs 0.3.
I prefer parameters. But its more like a 60/40 preference.
@Maximilian-Stefan-Ernst Thank you for the review, I'll address your comments in a next few days.
Regarding the params vs parameters. I prefer params, but I can change it to parameters if the stakeholders decide so :)
Just to summarize my approach:
- the core idea is to have sort-of "Huffman coding" for the terms in the API: frequently used concepts should use shorter, unambiguous, intuitive terms ("vars", "params", "cov");
less frequently used terms, or the ones that don't have an intuitive short version, could be longer (objective, gradient, hessian); also object type names could be longer because they are less frequently used in the code.
I think it makes the code more readable -- code lines are shorter and faster to read; the urge to introduce local variables just for the sake of avoiding using full names (e.g.
npar = nparameters(model)) is more subdued :) - that is in line with Julia naming conventions (struct, Dict, Ref, Val, eval, undef, Stats, sortperm, cov), there are precedents in the other popular packages, e.g. HTTP.getparam()
- it has to be consistent with the other terms: if it's decided on
parameters, it probably also means switching tovariables,covariationetc
Docs are fixed.
W.r.t. the params vs parameters debate - let's settle on params then, I am also more like a 60/40 for parameters, but since you have a clear preference for params we can settle on it by weighted majority vote^^.
Fun fact: I tried to see what Julia Base is using; params matches 19 files, parameters 42 - so Base settled on using both^^.
@Maximilian-Stefan-Ernst Thanks for your review and fixes! -- I've squashed them into original commits to make the history cleaner.
Special thanks for settling on params -- I guess I was 80/20 for it :) But if your preference for parameters() would grow stronger, you can always change.
Let me know if anything else is left for merging this PR.
Note regarding the doc fixes and other API tweaks (e.g. params_indices()) -- as there are other potential changes from #193 (and I have other ones recently) you may want to incorporate, I think it would be better to update the documentation after all of that potential updates are reviewed. I am a bit busy with my work-related projects, but I'll try to branch the next round of updates from #193 once this PR is merged.
The next things should be more material :) (ParamsArray for RAM matrix simplification, evaluate!() interface for unified objective/gradient/hessian evaluation)
Nice, thanks again for contributing all those changes :) Before merging, there are only 2 open comments/suggestions remaining. I think we should then merge this PR and the following ones into devel - after all the changes are merged and we want to make a new release, we can then merge devel into main.
Before merging, there are only 2 open comments/suggestions remaining.
Oh, right, I overlooked those. I've incorporated the iteration cleanup; as for pretty printing tweak -- see my comment.
I've rebased the branch onto devel, the conflicts are resolved.
@Maximilian-Stefan-Ernst I've revised your param_indices() addition.
param_indices(semobj) is a useful one-liner method, which I actually wanted to add myself.
However, I removed param_indices(semobj, params) before it becomes available:
- it is potentially confusing that the return type of methods with the same name is different
- internally,
param_indices(semobj, params)constructs the parameters dictionary at each call, which may get misused in certain contexts for large models (e.g. getting parameter indices in a loop). I think it is just a tiny bit inconvenient if instead the user has to callparam_indices(semobj)and then use it as a dictionary, but it is a bit less likely to be misused.
Eventually, SEM specification objects could cache param_indices map, and the removed method could be readded. OTOH I find using param_indices(sem) Dict directly is quite straightforward.
@Maximilian-Stefan-Ernst Let me know if there's anything else to tweak for this PR. Otherwise I am ready to rebase #193 once this one gets merged and extract the next round of changes :)
Sorry, just too much other stuff to do :)