clusterProfiler icon indicating copy to clipboard operation
clusterProfiler copied to clipboard

Retrieve the set of genes with assigned GO

Open m-bogaerts opened this issue 1 year ago • 1 comments

Hello,

I am using the function compareCluster for three different lists of genes (Drosophila melanogaster; flybase Fbgn). When I have the results I observe that not all the genes are used for the enrichment (i.e. a set of 182 genes goes to 142 genes) according to the ratio that is observed in the results, which I understand is because there are 40 without an associated GO term. Is there anyway to obtain the identity of the 142 genes that do have an associated GO term?

Thank you very much in advance.

m-bogaerts avatar Jun 17 '24 16:06 m-bogaerts

One way of achieving this would be by a 'simple' query of the OrgDb:

> ## load library
> library(org.Dm.eg.db)
> 
> ## extract the 'keys' (= geneid) that can be queried for
> k <- keys(org.Dm.eg.db)
> 
> ## check
> k[1:5]
[1] "30970" "30971" "30972" "30973" "30975"
> 
> 
> 
> ## query for the 1st 50 ids.
> res <- select(org.Dm.eg.db,
+               keys=k[1:50],
+               columns = c("GOALL"),
+               keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> 
> ## of these 50, which geneids do NOT have a GO annotation?
> ## answer: 5 genes
> unique( res[ is.na(res$GOALL), ]$ENTREZID )
[1] "30972" "30979" "30991" "31005" "31026"
> 
> length( unique(res[ is.na(res$GOALL), ]$ENTREZID) )
[1] 5
> 
> ## of these 50, which geneids do HAVE a GO annotation?
> ## answer: 45 genes
> unique( res[ !is.na(res$GOALL), ]$ENTREZID )
 [1] "30970" "30971" "30973" "30975" "30976" "30977" "30978" "30980" "30981"
[10] "30982" "30983" "30984" "30985" "30986" "30988" "30990" "30994" "30995"
[19] "30996" "30998" "31000" "31001" "31002" "31003" "31004" "31006" "31007"
[28] "31009" "31010" "31011" "31012" "31013" "31014" "31015" "31016" "31017"
[37] "31018" "31019" "31020" "31021" "31022" "31023" "31024" "31025" "31027"
> 
> length( unique( res[ !is.na(res$GOALL), ]$ENTREZID ) )
[1] 45
>

Note that you may need to adapt the argument keytype when using FlyBase ids.

guidohooiveld avatar Jun 18 '24 18:06 guidohooiveld