MultivariateStats.jl icon indicating copy to clipboard operation
MultivariateStats.jl copied to clipboard

Provide StatsAPI interface for regression

Open wildart opened this issue 2 years ago • 6 comments

Currently, regression algorithms implemented as stand-alone functions, while other methods use StatsAPI interface, i.e. fit/predict.

We should have properly derived types from StatsAPI.RegressionModel and corresponding implemented interface for various regression algorithms.

wildart avatar Jan 17 '22 20:01 wildart

Responding to question in #109 - @wildart I'd be happy to take a stab at it, if there's a well defined API / clear instructions for implementation. I'm afraid I'm not that familiar with most of the methods in this package or with StatsAPI, but if there's a regular structure, I can probably figure it out.

kescobo avatar Jan 18 '22 16:01 kescobo

Basically, every algorithm in this package has fit method for building a model, see StatisticalModel , and predict for predicting response of a model, see RegressionModel. These two methods are a bare minimum what is required for the regression implementation. The rest of the interface could be approached later.

So, there need to be defined a type derived from RegressionModel that would hold the model parameters, regression coefficients. The fit method would call existed implementation, ridge, and form an object of the model type. The predict method should form prediction given the model parameters. You can look at other algorithms' implementations for guidance, e.g. PCA.

wildart avatar Jan 18 '22 19:01 wildart

That makes sense. As I said, I'm happy to take a stab, though realistically it's unlikely to be in the next week or two - I'm teaching this semester and need to get a lot more prep done. If there's not a rush on it, I can definitely tackle it by ~mid February.

kescobo avatar Jan 18 '22 21:01 kescobo

Any help is appreciated at any time.

wildart avatar Jan 19 '22 13:01 wildart

Looking at this a bit more closely today, I do not think I'm the right person for this job, sorry! I feel like if I had a strong handle on the package interface OR the statistical methods, I could use one thing to reason about the other. But being a novice on both, even using your hints above, I'm not sure how to get started :-(

kescobo avatar Feb 07 '22 21:02 kescobo

For minimal implementation, you would need to

  1. Define a type for a regression model, derived from StatsAPI.RegressionModel, e.q. OLS that will hold the coefficients of the OLS regression model.
  2. Define a fit function that accepts three parameters: OLS type, independent x and dependent y variables. This function will execute llsq which will calculate a regression model parameters, and return an OLS object.
  3. Define a predict function, see it description here: https://github.com/JuliaStats/StatsAPI.jl/blob/00ce15f034e7ffdf16ec988766246755fcab47c4/src/regressionmodel.jl#L74-L81

The data parameters should be of AbstractMatrix or AbstractVector types. You may want to include generic placeholder for kw-arguments to path through parameters to llsq call.

The rest of the methods for RegressionModel are optional at this point. Feel free to implement any of them. See any method implementation in this repo: MDS, PCA, etc.

wildart avatar Feb 09 '22 16:02 wildart