scrattch.hicat Details on the clustering method

Details on the clustering method

Open rmathieu25 opened this issue 3 years ago • 2 comments

Hello,

I used scrattch.hicat for my scRNA-seq analysis and I thank you for the developpement of this great tool.

However, In the pipeline that you used in the Tasic et al, 2018 paper, I would like clarification on some points.

You performed the bootstrapping and consensus clustering on each of the broad class that you identified beforehand. What it is not clear for me it is when you merged the co-clustering matrices. I understand that you merged the co-clustering matrices of PCA and WGCNA modes for each broad class. Is it the case? So steps until the merging module is applied for each broad class, right?
In the de_param function, to set the de.score.th you recommand to use for small datasets (#cells < 1000), a de.score.th = 40, and for large datasets (#cells > 10000), a de.score.th = 150. But do we consider the whole dataset to set this de.score.th or the number of cells encompassed in each broad class? For example if I have a dataset of 8000 cells with 3 classes (class1: 6000 cells/ class2: 1500 cells/ class3: 500 cells), I have to set:

or, do I have to set: de.score.th=130 for all?

Last question: when assigning core and intermediate cells, if you find that the best.cluster.score is not the original cluster of the cell, do you reassign this cell to its best cluster or do you keep the original one?

Thank you in advance. Best regards

Feb 15 '21 11:02 rmathieu25

To address your questions:

We now don't split by class at the beginning. Just do run_consensus_clust using all your cells from the start. If you split by class first, then you can set "init.result=list(cl=class)".
I recommend using de.core.th to one value. The recommendation is based on how many cells you would expect to see per cluster. If you want to split between a pair of clusters with 10~20 cells each with ~10 DE genes, then you may use de.score.th=50. If you have hundreds of cells each with similar number of DE genes, then you may use de.score.th=150. You get higher score due to the power of additional cells.
It's up to you whether you want to reassign cells. To assign core vs intermediate cells, the cells have different best.cluster and original cluster are considered intermediate cells. We have moved away using this strategy, as the results depend too much on the choice of classification algorithms.

Feb 18 '21 22:02 yzizhen

Thank you very much for your answer.

For the point 3, Do you recommand to use the KNN results instead ?

Feb 19 '21 00:02 rmathieu25