SC3 icon indicating copy to clipboard operation
SC3 copied to clipboard

Is there a way to pull out the stability index matrix from the SC3 object?

Open yingzhang121 opened this issue 6 years ago • 7 comments

Hi,

I am wondering whether there is a way for me to run function calculate_stability to get the stability index matrix?

I tried this function in SC3_1.7.6 and SC3_1.3.18, neither worked.

Best, Ying

yingzhang121 avatar Jan 29 '18 17:01 yingzhang121

Hi Ying,

to plot clustering stability you have to invoke sc3_plot_cluster_stability(sce, k = 3). That said, it is a prerequisite that you have run the sc3 for that k. For example if you run sc3(sce, ks = 2:3) then sc3_plot_cluster_stability would yield an error if your k is outside of the [2,3] range.

pati-ni avatar Feb 06 '18 17:02 pati-ni

Hi, Pati,

I think you mis-understood my question. I have already run the full SC3 workflow, and I do have my object.

So rather than simply plotting the stability index, I want to extract the same information and save to a data frame. It seems the function calculate_stability should do the work, but it doesn't even allow me to run it directly.

That is why I am asking whether there is a way to extract this information.

Best, Ying

yingzhang121 avatar Feb 06 '18 20:02 yingzhang121

@yingzhang121 Currently that functionality is not exposed to the user space. But it is possible as an enhancement for future iterations.

pati-ni avatar Feb 06 '18 22:02 pati-ni

@pati-ni It will be great to include this function in future release. Thank you for taking my input.

I also have a related question that might be off topic, however, I just look for insight from the developers. My question is how we should interpret the stability index? For example, I know the larger the index, the cluster is more stable, but does a stability index of 0.2 useless? In my project, I always got a K estimate around 30, and except 1-2, the rest of clusters usually have an index around 0.1-0.2. If we look into more details, for the high-indexed cluster, it usually contains less cells. So what does this mean? Should we pay less attention to the low-indexed clusters that include the majority of cells? I guess the basic question is what is an expected stability index for a specific K. I can image that with a few thousand cells, the probability of getting a specific clustering result is usually low, maybe as low as 0.0000000001. Then no matter how low is the stability index, the cluster should be significant (or stable) in some sense. But if this is true, then what is the purpose of this iterative permutation on different clustering algorithms and the design of the stability index?

So I would like to ask you to provide a baseline of the stability index, like a line of 0.1. Then we might draw conclusions like if the index is below the baseline, the cluster is a result of some random effect. Otherwise, it is a true statistical significant result.

Thank you for spending your time reading my post.

yingzhang121 avatar Feb 08 '18 15:02 yingzhang121

@yingzhang121 thanks for your question! Please note, that k estimation is not the true k, it is just an estimation, and we also noticed that it overestimates k for UMI-based dataset (where the sparsity of the matrix is much higher than in full length transcript protocols). Regarding your stability question - please note that stability is relative to the range of ks you've run clustering for. So, if your range of ks is small you might get one value, then if you add more ks to your calculations you will get a different value. Again, stability index is not the ultimate truth, it's more of a guidance for yourself. Its value decreases in two cases: 1. If cells are removed from your cluster when you change k; and 2. If your cluster splits into multiple clusters when you increase k. Hope this helps.

wikiselev avatar Feb 08 '18 16:02 wikiselev

@wikiselev Thank you for the detailed reply. So for my SC3 workflow, I started with k estimation, then set up SC3 workflow with a series of K's surrounding the estimated K value. I thought this was a better way to check for the "real" clustering result. However, I might be wrong given your explanation. And, yes, I did notice that SC3 reported a higher number of clusters (k estimate) than other methods I used, such as CIDR, Seurat, SCDE etc. However, I do believe generating a clustering consensus is the way to go, so I also tried another package "clusterExperiment". Then I found I could use the stability index from K to guide the combineMany function. In simple, combineMany from clusterExperiment requires an input of "proportion" (a value for how frequently two samples were grouped into one cluster). For the same dataset, when my SC3 (k=35) and majority of stability index is below 0.3, I can use 0.3 in combineMany function and got 14 clusters. I also tried to use the up-limit of the stability index (0.7 in this case) in clusterMany, and I got 84 clusters. Intuitively, if the cluster is more stable, then the two samples are more frequently grouped together, and more stringent threshold should result in more clusters. I know this looks like weird, but I plan to compare all the clustering consensus with my Seurat results, and hope we could identify the same group of cells again and again.

yingzhang121 avatar Feb 08 '18 16:02 yingzhang121

@yingzhang121 yes, it's a pretty complicated analysis, but I hope you will get good results!

wikiselev avatar Feb 13 '18 09:02 wikiselev