Check for unused keyword arguments
That is probably the topic for a separate issue, but I think the main ctors for SemSpecification, SemObserved and Sem that do all of the actual construction work should get the essential inputs as positional arguments (that is also important for type-dependent dispatch) and don't do keyword arguments passthrough.
Then it might be possible to have a simple single ctor-like function for Sem() that internally creates all 3 objects like it does now, but with more constrained set of keywords.
Also, explicitly creating SemSpecification, SemObserved and explicitly passing them to Sem might in the end be not so much more complex for the user.
The real design problem with that approach is that SemSpec and SemObserved are independent objects, but they are constrained to have the same set and same order of observed variables.
Maybe it is possible to utilize ScopedValues from 1.11, so that the user explicitly calls Sem(SemSpec(...; kwargs1...), SemObserved(...; kwargs2); kwargs3...), but the observed variables order is implicitly passed to SemSpec() and SemObserved() via ScopedValues.
Or maybe something like
@Sem(data, SemObserved(...), SemSpecification(...), ...)
where the macro generates the code that first calls SemObserved(data, ....), then extracts variable order out of it and passes it to SemSpec(vars, ...), then calls Sem(obs, spec, ....) ctor.
Originally posted by @alyst in https://github.com/StructuralEquationModels/StructuralEquationModels.jl/pull/228#discussion_r1894408056
One other way to solve this is to support implied, observed etc to be anonymous functions, i.e..
Sem(dataframe, @StanGraph(...), observed = SemObservedMissing, specification = (def, obs_vars) -> RAMMatrices(def, observed_vars = obs_vars, meanstructure = false), implied = spec -> RAMSymbolic(spec, sparse_S = true), loss = SemML)
Also, currently Sem does not support regularization, it could be supported with something like
Sem(..., S_regularization = SemRidge => 0.1, A_regularization = SemLasso => 1.0, M_regularization = ...)
But this 1-line way of creating SEM models should probably be limited to simple single group/non-ensemble problems, otherwise both package code and the user-side constructs can quickly become too complicated.
For more complex cases the user will have to explicitly create SemSpecification(), SemObserved(), SemImply(), and pass them to Sem(...).
Also, while the first example is 1-line Sem() call, the code is longer than just calling
Sem(SemML(SemObservedMissing(dataframe), RAMSymbolic(RAMMatrices(@StanGraph(...), meanstructure = false), sparse_S = true)))
so the only cases that are really simplified are the ones where the users relies on the default values (Sem(dataframe, @StanGraph(...))).