cca_zoo icon indicating copy to clipboard operation
cca_zoo copied to clipboard

nan in transformed matrix

Open AdirRahamim opened this issue 3 years ago • 4 comments

Hi, I run basic CCA on some dataset, it works well, but sometimes I get nans at the final columns of now of the transformed matrices, for example:

basic_cca = CCA(latent_dims=768)
basic_cca.fit((layer, second_layer))
U, V = basic_cca.transform((layer, second_layer))

Here layer and second_layer are both matrices of shape [Nx768], and sometimes I get that the last column of U (i.e. U[:, 767]) is all nans. Just to mention, if I change latent_dims to 767 or lower, everything works(but I need it to be 768). Is there any idea how to solve it? Or what can I change in order to solve it? Thanks

AdirRahamim avatar Dec 28 '21 20:12 AdirRahamim

Another problem - sometimes, for totally identical inputs I don't get the same transformed matrices, i.e:

basic_cca = CCA(latent_dims=768)
basic_cca.fit((layer, layer))
U, V = basic_cca.transform((layer, layer))

and U not equal V, is there a way to fix this?

AdirRahamim avatar Dec 28 '21 21:12 AdirRahamim

My guess is that this is a situation where the number of samples==number of features in one/both views?

If yes (this was the situation where I was able to reproduce your problems) then there are always likely to be some numerical instabilities and I will add a warning for this case. I'll have a look at why in particular the last eigenvalue/vector are the main problem. For that reason I'm a little bit wary of suggesting a hacky solution.

One option is to use MCCA rather than CCA since MCCA solves a different (but equivalent for 2 views) eigenvalue problem. This seems to be slightly more stable for the first k-1 eigenvectors but still looks unstable in the last one.

jameschapman19 avatar Dec 29 '21 00:12 jameschapman19

No, number of samples is 6 times larger than number of features in both views. Most of the times it works well, but sometimes I get this error.

AdirRahamim avatar Dec 29 '21 07:12 AdirRahamim

So the gist of the way the solver works is to get the PCA components of the original data and runs the algorithm on the reduced data. This is mathematically equivalent to running CCA on the original data.

This is related to the reason why you can't get more than min(p,q) CCA components from p and q dimensional inputs.

My guess (related to the n=p case and I suspect nonetheless true in your n=6p case) is that your data is not full rank i.e has <768 principal components (non-zero eigenvalues).

If this is the case then there isn't much that can be done to 'solve' your problem but from my end it is something I can check i.e. if any eigenvalues are zero then limit the number of possible CCA components.

Is this the case for your inputs (can check by running svd on the input data matrices and looking for zeros)

jameschapman19 avatar Dec 29 '21 15:12 jameschapman19