ProDy icon indicating copy to clipboard operation
ProDy copied to clipboard

Add support for weighted PCA and ICA/tICA?

Open SHZ66 opened this issue 2 years ago • 3 comments

I think it is worth considering adding support for weighted PCA and ICA/tICA in ProDy.

The former should be fairly easy, since it already exists to an extent in the current PCA class already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L180

but only when the input data is an Ensemble class with weights. A similar treatment should be added to https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L166

So that one can pass a weight vector (for each sample) or matrix (for each sample and atom) as a parameter to PCA.buildCovariance.

ICA is trickier to implement, but the covariance matrix is the same. Only the decomposition part is different. A good formula to follow is probably from scikit learn: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html

SHZ66 avatar Aug 25 '22 19:08 SHZ66

sounds good to me

We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

jamesmkrieger avatar Aug 25 '22 19:08 jamesmkrieger

sounds good to me

We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

Great! I can take the WPCA for a spin if you'd like.

I wonder if their speed-up comes from the fact that they are using SVD instead of the regular eigensolver, which is provided as an option in ProDy already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L230 (Although I think the API point performSVD should be integrated into calcModes and can be turned on by a switch).

SHZ66 avatar Aug 25 '22 19:08 SHZ66

sounds good to me We may also want to consider giving an option for scikit-learn PCA, which seems to be faster

Great! I can take the WPCA for a spin if you'd like.

Yes, go ahead!

I wonder if their speed-up comes from the fact that they are using SVD instead of the regular eigensolver, which is provided as an option in ProDy already: https://github.com/prody/ProDy/blob/master/prody/dynamics/pca.py#L230 (Although I think the API point performSVD should be integrated into calcModes and can be turned on by a switch).

I'm not sure. Could be. I haven't yet got round to systematically comparing it.

There's an implementation in https://github.com/scipion-em/scipion-em-continuousflex/blob/rv_pdb_dimred/continuousflex/protocols/protocol_pdb_dimred.py that I'd be comparing with.

They also have UMAP that looks quite similar so may be worth adapting into ProDy too

jamesmkrieger avatar Aug 25 '22 20:08 jamesmkrieger