DEGreport degPatterns results exists unsimilar pattern in a cluster

degPatterns results exists unsimilar pattern in a cluster

Open etbuface opened this issue 5 years ago • 1 comments

Hi, i'm using degPatterns to cluster some genes across different time points.

here is part of my code :

clusters <- degPatterns(log2(salld.norm), metadata = colData, time = "age", minc = 5, reduce = T, scale = T)

And here is my metaData :

ID	age
1	04d
2	16d
3	28d
4	32d
5	36d
6	40d
7	44d
8	52d
9	56d

In my clusters result, I found that some groups are rising over time. Then I plot every genes' normalized counts in those groups. However, it seemed that they are not exactly what i thought. For example, some of genes are rising over time significantly in a cluster. But another of genes are not so significant change in the same cluster. Besides, some genes from another cluster seem more likely should cluster with the rising genes. I was wondering why those 'unsimilar' genes could cluster with my rising genes.

Figure 1 Figure 2

How should i set the groupDifference to cluster more similar genes to one clutser. ( some of clusters seems very similar in my opinion. I don't know why they are divided into multiple clusters.)

Figure 3

I use minc = 5 to get more return clusters and reduce = T to remove some outliers in clusters. I also use scale = T. Because i just care about the change pattern not the exact count. But i'm also curious that if scale = T is necessary. The kendall test is based on the data rank, right? So what's the influence of scale = T ? Is my understanding of the above parameters correct? I also noticed that there may be some ridiculous outlier if not using the reduce = T. How could these genes cluster with those 'consensus/common' genes?

Aug 25 '19 06:08 etbuface

Hi @etbuface,

thank you for the details.

This function works in the following way:

1-make pair-wise correlations between the input genes (that they should be significant genes defined by some other method, like DESeq2) 2-hierarchical clustering 3-cut the tree at a given point

The third point is the one will define the cluster you see. With Consensus Cluster option one, it may give better clusters, but it is not always the case. This option will use the ConsesusCluster package to define groups.

It is normal to find clusters that go almost identical, but you can see there is always a little different. I use the plot to then merge the groups to make more sense with your biology. If that little difference is not important, it makes sense to put all together.

It is common as well to find some genes that show a bigger difference when you plot the non-scale value, but the scale value should show the same pattern, even if the difference is not equal.

There is a couple of plots in the output of the function if you save it into a variable that may help you define the cutoff (http://lpantano.github.io/DEGreport/reference/degPatterns.html#value benchmarking). Look at http://lpantano.github.io/DEGreport/reference/degPlotCluster.html to see how to plot using different cutoffs.

At the end of the days, the last step is arbitrary, and some genes will go to a cluster even if they are not similar because when you cut the tree they will be part of a group. That is the reason I added reduce to remove those cases.

You are right about scale, it shouldn't be different, it is more a historical parameter and I probably should remove it.

I hope this helps.

Aug 26 '19 15:08 lpantano

DEGreport DEGreport copied to clipboard

degPatterns results exists unsimilar pattern in a cluster

DEGreport
DEGreport copied to clipboard