cca_zoo icon indicating copy to clipboard operation
cca_zoo copied to clipboard

Running CCA on GPU

Open fipelle opened this issue 2 years ago • 11 comments

Hi, is it possible to run a subsection of these versions of CCA on GPU? If so, would you please write down a short example?

fipelle avatar Mar 04 '22 01:03 fipelle

Yeah the examples are set up to work with pytorch lightning so should be as simple as passing GPUs=1 to trainer as in here:

https://pytorch-lightning.readthedocs.io/en/stable/common/single_gpu.html

jameschapman19 avatar Mar 04 '22 11:03 jameschapman19

Does it only apply to the "Deep Models"?

fipelle avatar Mar 04 '22 18:03 fipelle

Ah I see - yes only the deep models.

I'd be curious which model is running slow.

In the alternating optimisation methods the bottleneck will be scikitlearn regression solvers. In the CCA/regularisedCCA/PLS models the bottleneck will be the eigenvalue problem solver.

If you're aware of a GPU accelerated version of either of these bottlenecks I'd be interested either as a pointer or a PR.

The other possible direction is the SOTA speed models for CCA that generally use stochastic methods e.g. https://proceedings.neurips.cc/paper/2017/file/c30fb4dc55d801fc7473840b5b161dfa-Paper.pdf

Or

https://proceedings.neurips.cc/paper/2014/file/54229abfcfa5649e7003b83dd4755294-Paper.pdf

jameschapman19 avatar Mar 04 '22 18:03 jameschapman19

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

jameschapman19 avatar Mar 04 '22 18:03 jameschapman19

Also worth saying if you use the Deep CCA methods with single layer linear encoders they should converge to CCA and then could use GPU via that route (if you use full batch gradient descent ie minibatch size = dataset size)

Thanks, I will start with that then - trying to build from the examples in the documentation.

I'd be curious which model is running slow.

I am trying to run NCCA with a very large dataset. I was hoping in a GPU implementation for the NN part in the same spirit of the one in cuml - I am not an expert in NCCA and I do not know it is feasible! :)

I thought about pre-processing the data with PCA first and then running NCCA, but it does not look very elegant given that I would have to employ two dimensionality reduction techniques.

fipelle avatar Mar 04 '22 18:03 fipelle

Ah interesting! Must admit my implementation of NCCA is definitely functional rather than optimized. If there's a faster nearest neighbour algo that can be imported I'd defo drop it in.

Just spotted that the sklearn implementation I've been using can take n_jobs which I don't utilise fully at the mo so that's an easy win

jameschapman19 avatar Mar 04 '22 18:03 jameschapman19

https://towardsdatascience.com/make-knn-300-times-faster-than-scikit-learns-in-20-lines-5e29d74e76bb

Might take a look at this

jameschapman19 avatar Mar 04 '22 18:03 jameschapman19

Thanks! Take a look at the NN in https://docs.rapids.ai/api/cuml/stable/api.html. I think you should be able to import it right away given that it shares a lot of the scikit-learn syntax.

fipelle avatar Mar 04 '22 19:03 fipelle

I am trying to see if it is sufficient to change ncca.py#5 from from sklearn.neighbors import NearestNeighbors to from cuml.neighbors import NearestNeighbors. It would also be nice to have access to the NN options described in https://docs.rapids.ai/api/cuml/stable/api.html#nearest-neighbors when defining NCCA.

EDIT

It seems to work.

fipelle avatar Mar 05 '22 14:03 fipelle

Just seen the edit! that's cool!

jameschapman19 avatar Mar 14 '22 11:03 jameschapman19

Hi! I just came across this issue due to the cuML / RAPIDS mention.

I wanted to note that we've implemented input-to-output data type consistency for all cuML estimators (not just NearestNeighbors). This means that while training and prediction will be on the GPU, if you pass CPU data into the estimator it will return CPU data from estimator methods and be compatible with existing code that relies on CPU data.

We've seen folks build wrapper classes or just use a utility function like the following and wrap the instantiation of the estimators into branching statements:

def has_cuml():
    try:
        import cuml
        return True
    except ImportError:
        return False

Happy to help answer questions about cuML (or RAPIDS in general) if you have any.

beckernick avatar Mar 18 '22 16:03 beckernick