MultivariateStats.jl RFC: Unified API

RFC: Unified API

Open wildart opened this issue 4 years ago • 7 comments

Following #95, I looked at MV models/methods implemented in this package, trying to figure out what would be a type hierarchy and corresponding method interfaces for this package.

Here is a table of models and corresponding function names used by models.

Function \ Model	CCA	WHT	ICA	LDA	FA	PPCA	PCA	KPCA	MDS
fit	x	x	x	x	x	x	x	x	x
transform	x	x	x	x	x	x	x	x	x
predict				x
indim		x	x	x	x	x	x	x	x
outdim	x	x	x	x	x	x	x	x	x
mean	x	x	x	x	x	x	x	?
var					x	x	?	?	?
cov					x	?
cor	x
projection	x				x	x	x	x	x
reconstruct					x	x	x	x
loadings	?			?	x	x	?	?	?
eigvals					?	?	?	?	x
eigvecs					?	?	?	?	?
length
size

I put ? where a possible implementation is missing or called differently.

So, I propose following type hierarchy

StatsBase.RegressionModel
- Methods: CCA, LDA
- Functions: fit, transfrom, indim, outdim, mean
- Subtypes:
  - AbstractDimensionalityReduction
  - Functions: projection, var, reconstruct, loadings
  - Subtypes:
    - LinearDimensionalityReduction
      - Methods: ICA, PCA
    - NonlinearDimensionalityReduction
      - Methods: KPCA, MDS
    - LatentVariableModel or LatentVariableDimensionalityReduction
      - Methods: FA, PPCA
      - Functions: cov
StatsBase.AbstractDataTransform
- Whitening
- Functions: fit, transfrom, indim, outdim, mean, size

@nalimilan @ararslan Thoughts?

Oct 10 '19 19:10 wildart

That makes sense to me. Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

Oct 10 '19 19:10 ararslan

Might be nice to have an abstract dimensionality reduction type in there that linear, nonlinear, and latent variable types can subtype.

That would be AbstractDimensionalityReduction

Oct 10 '19 20:10 wildart

Whoops, don't know how I missed that...

Oct 10 '19 21:10 ararslan

This seems great to me.

As my primary interest in this is for plotting, one thing I'd like to know is whether there's a common method for obtaining a vector that would be used in a plot. I'm not super knowledgeable about the terminology, but I think different things are commonly plotted for different dimensionality reductions. For MDS and PCA (I think), one is supposed to plot the eigenvectors scaled by the square of the eigenvalue.

But finding information on this has been a bit challenging for me, not knowing all of the jargon.

Oct 19 '19 15:10 kescobo

Loadings are scaled eigenvectors. It will be easy to add them to every eigendecomposition-based method.

Oct 20 '19 06:10 wildart

Sounds like a good idea. Is the LinearDimensionalityReduction vs. NonlinearDimensionalityReduction useful? I guess it doesn't hurt, but in your plan it doesn't really make a difference AFAICT.

Also, shouldn't PCA implement loadings?

Oct 21 '19 12:10 nalimilan

Fantastic. What about things like LDA and CCA? I've definitely seen those plotted, but your schema above doesn't have loadings for those, cf.

I know this is somewhat orthogonal, I can open a separate issue if that would be useful. In any case, having unified APIs for this stuff will be fantastic.

Oct 21 '19 16:10 kescobo

MultivariateStats.jl MultivariateStats.jl copied to clipboard

RFC: Unified API

MultivariateStats.jl
MultivariateStats.jl copied to clipboard