clusterProfiler
clusterProfiler copied to clipboard
compareCluster GSEA + universal annotation = does not work
Hi, I am working with compareCluster, trying to do a GSEA analysis using homemade annotation. To do so I decide to use Universal enrichment analysis.
I have a TERM2GENE and TERM2NAME. I am trying to run the same a GSEA (with all my genes - not only differential expressed ones). I have an error: Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....
Here my code (let's say input1, input2 and input3 are subsets of input):
geneList1=input1$logFC names(geneList1)=input1$Symbol geneList1=sort(geneList1, decreasing = TRUE)
geneList2=input2$logFC names(geneList2)=input2$Symbol geneList2=sort(geneList2, decreasing = TRUE)
geneList3=input3$logFC names(geneList3)=input3$Symbol geneList3=sort(geneList3, decreasing = TRUE)
geneList=c("Gut"=geneList1, "Head"=geneList2, "Abdomen"=geneList3)
gsea=compareCluster(geneList, data=input, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, fun="GSEA" minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )
This is my error: Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....
However, when I run the same geneLists one by one (geneList1, then geneList2 and so one) using the following code, it works perfectly (see my pictures at the end)
GSEA(geneList1, TERM2GENE=TERM2GENE, TERM2NAME =TERM2NAME, minGSSize = 10, nPermSimple = 100000, eps=0, pvalueCutoff = 0.1, pAdjustMethod = "BH" )
Some one could help me here please?
Thank you
Everything points to something not being OK with your aggregated input geneList. Please double-check it!
Please note that performing GSEA with user-defined gene sets through the compareCluster function works as expected. See my code below.
Also note that nPermSimple is a legacy argument, and should not be used anymore! Only use the argument eps.
> ## load libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
>
> ## load sample data
> data(geneList, package="DOSE")
> head(geneList)
4312 8318 10874 55143 55388 991
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857
>
> ## using sample data, create list with 3 comparisons to be used as input for comparCluster
> ## note that 'Head' is the reverse of 'Gut' and 'Abdomen'.
> inputList <- list(Gut = geneList, Abdomen = geneList, Head = sort(-1*geneList, decreasing = TRUE) )
>
> str(inputList)
List of 3
$ Gut : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
$ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
$ Head : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
>
>
> ## create TERM2GENE and TERM2NAME for KEGG
> ## note that when using the fuction gseKEGG this step is done automagically
> kegg.data <- download_KEGG(species="hsa", keggType = "KEGG", keyType = "kegg")
> term2gene.kegg <- kegg.data$KEGGPATHID2EXTID
> term2name.kegg <- kegg.data$KEGGPATHID2NAME
>
> head(term2gene.kegg)
from to
1 hsa00010 10327
2 hsa00010 124
3 hsa00010 125
4 hsa00010 126
5 hsa00010 127
6 hsa00010 128
> head(term2name.kegg)
from to
1 hsa01100 Metabolic pathways
2 hsa01200 Carbon metabolism
3 hsa01210 2-Oxocarboxylic acid metabolism
4 hsa01212 Fatty acid metabolism
5 hsa01230 Biosynthesis of amino acids
6 hsa01232 Nucleotide metabolism
>
>
> ## run compareCluster with generic function GSEA, and user-provied TERM2GENE and TERM2NAME
> res <- compareCluster(geneCluster = inputList,
+ fun = "GSEA",
+ eps = 0,
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH",
+ TERM2GENE = term2gene.kegg,
+ TERM2NAME = term2name.kegg)
>
> ## convert entrezids into symbols
> res <- setReadable(res, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
>
> ## check
> res
#
# Result of Comparing 3 gene clusters
#
#.. @fun GSEA
#.. @geneClusters List of 3
$ Gut : Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
$ Abdomen: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
$ Head : Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
#...Result 'data.frame': 172 obs. of 12 variables:
$ Cluster : Factor w/ 3 levels "Gut","Abdomen",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "hsa04110" "hsa03050" "hsa04657" "hsa05169" ...
$ Description : chr "Cell cycle" "Proteasome" "IL-17 signaling pathway" "Epstein-Barr virus infection" ...
$ setSize : int 139 43 85 193 33 62 86 130 55 80 ...
$ enrichmentScore: num 0.664 0.709 0.562 0.434 0.723 ...
$ NES : num 2.83 2.41 2.2 1.96 2.31 ...
$ pvalue : num 9.16e-21 2.97e-08 8.38e-08 1.52e-07 4.07e-07 ...
$ p.adjust : num 3.13e-18 5.08e-06 9.56e-06 1.30e-05 2.78e-05 ...
$ qvalue : num 2.30e-18 3.72e-06 7.00e-06 9.51e-06 2.04e-05 ...
$ rank : num 1155 2516 2880 2820 1905 ...
$ leading_edge : chr "tags=36%, list=9%, signal=33%" "tags=65%, list=20%, signal=52%" "tags=49%, list=23%, signal=38%" "tags=39%, list=23%, signal=31%" ...
$ core_enrichment: chr "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1/MCM2/CDC25A/CDC6/PLK1/BU"| __truncated__ "PSMA7/PSMD3/PSMB9/PSMB5/IFNG/PSMD7/ADRM1/PSME2/PSMB3/PSMA4/PSMB2/PSMA3/PSMA5/PSMB7/PSMD14/PSME4/SEM1/PSMB10/PSM"| __truncated__ "MMP1/S100A9/S100A8/S100A7/CXCL10/CXCL3/CCL20/FOSL1/MMP9/CXCL8/LCN2/CCL2/MUC5B/CEBPB/CCL7/IFNG/CCL17/CXCL5/CXCL1"| __truncated__ "CXCL10/CCNA2/TAP1/ISG15/CCNE1/CCNE2/SKP2/STAT1/HLA-DRB4/HLA-DOB/MYC/CD3G/PSMD3/E2F1/IRAK1/CD247/CD3D/LYN/OAS1/R"| __truncated__ ...
#.. number of enriched terms found for each gene cluster:
#.. Gut: 56
#.. Abdomen: 58
#.. Head: 58
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> ## prepare dotplot
> p <- dotplot(res, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> print(p)
>
>