frrsa
frrsa copied to clipboard
Improve computation of final betas
Possibly change how optionally returned final betas
are computed to make them more useful for downstream analyses. For that, change frrsa/frrsa/fitting/fitting/final_model
so that it does a repeated CV to find the best hyperparameter for the whole dataset. Unclear how to deal with several hyperparams i.e. whether averaging them is actually sensible. The best course of action might also depend on the kind of hyperparameter, i.e. whether one uses fractional ridge regression or the more classical approach as sklearn does (this depends on how one sets the parameter nonnegative
).
PRs for this issue welcome.
For context: currently, the optionally returned betas
are computed using the whole dataset. However, in order to do so, one needs the best hyperparameter for the whole dataset (for each target separately, if one has more than one). This best hyperparameter is currently only ad-hoc calculated by averaging the hyperparameters from all outer cross-validations. Note that the hyperparameter from an outer cross-validation is only based on a sub-subset of the whole data (as it's estimated in the inner cross-validation), not guaranteeing to yield the actual best hyperparamter for the whole dataset.
~If the aim is to generalize betas to a second target
their might be a (more feasible) alternative:~
~The user submits two (or several) target
s. One of them will be used to fit the statistical model. The others will be simply correlated with the reweighted predicting matrix, essentially generalizing the model. This should likely be done in the func fit_and_score
.~
moved to its own issue #46