cell2cell Choices of upper_rank and tf

Choices of upper_rank and tf_type

Open deepcompbio opened this issue 1 year ago • 2 comments

Hi Erick,

The choice of upper_rank has a significant impact on the automatically determined rank. They go in the same direction. How would one make the choice of upper_rank in elbow analysis and if needed a manual rank for eventual factorization?

And for tf_type, there are four types. Except that 'non_negative_cp_hals' does not allow a mask, how would one choose among the rest three factorization methods?

Thanks.

Oct 07 '22 03:10 deepcompbio

Hi!

Thanks for this new question, happy to see that you are getting familiar with our tool.

Regarding the elbow analysis, theoretically you can increase the upper_rank up to any number; however, when you run decompositions with large numbers of factors you will have a huge demand of memory, which could be a bottleneck and output an error.

For selecting the number of factors, try to use a realistic upper limit. We put just 25 since that's a large enough for us. However, if you are willing to handle even more –considering that the more factors you use, the more factors you have to interpret– you can increase the upper_rank. More factors will lead to a lower error, and since the elbow analysis is an automated way based con derivatives of the error curve, of course your number of selected factors will change. In this regard, you can always try different approaches to select the number of factors, even manually, depending on what your trade-off is. We are planning to implement an elbow analysis based on the similarity of decompositions instead of their error too, but it will be available at some point in the future.

About the tf_type parameter, for now this is experimental since we are trying new decomposition methods for other analysis with Tensor-cell2cell. We recommend using the default option since that's the one we introduced in the Tensor-cell2cell paper. Nevertheless, the option'non_negative_cp_hals' is a better algorithm in terms of converging into robust solutions, but with the disadvantage that only works without masks in this case (in other words, only for when building the tensor with the parameter how='inner').

I hope this is clear enough, otherwise let me know.

Erick

Oct 07 '22 22:10 earmingol

Many thanks, Erick. Your reply is very helpful, as always.

The idea of elbow analysis based on the similarity of decompositions sounds promising. Actually I also tried to increase the rank a bit manually to compare the decompositions with those from auto rank. Looking forward to testing your new elbow analysis method in the future.

Oct 08 '22 01:10 deepcompbio

Hi @deepcompbio

The elbow analysis based on similarity is now available in the v0.6.2 in this PR https://github.com/earmingol/cell2cell/pull/17.

The way to use it is to add the parameter metric='similarity' when running:

tensor.elbow_rank_selection(upper_rank=25,
                                           runs=10,
                                           init='svd',
                                           automatic_elbow=True,
                                           random_state=888,
                                          )

Also, if the curve looks odd, you can smooth it by passing the parameter smooth=True to the same function.

Nov 04 '22 02:11 earmingol

Hi @earmingol

This is cool and fast. Many thanks for adding this functionality. I will give it a try.

On a relevant topic, I have been recently experimenting with different ranks in factorization, from a dozen to a few hundreds (until my GPU memory runs out). It seems that higher # of factors yields higher # of interesting ligand-receptor interactions (some of the interactions are known in literature for the particular disease I'm studying). Thus the question is how could one determine which rank is sufficient for factorization from the biological perspective? Thanks.

Nov 08 '22 01:11 deepcompbio

cell2cell cell2cell copied to clipboard

Choices of upper_rank and tf_type

cell2cell
cell2cell copied to clipboard