clusterProfiler icon indicating copy to clipboard operation
clusterProfiler copied to clipboard

enrichKegg Bug

Open zellerivo opened this issue 1 year ago • 8 comments

Dear Prof. Guangchuang Yu,

I am an avid user of your package and I want to express my sincere appreciation for all the work and effort you put into it. I particularly like you're creativity when it comes to visualisation and the ease of use of your packages. So thank you very much for that !

The enrichKegg function does not work on my system. It should be something with the USER_DATA object according to my debugging observation.

Example: data(geneList, package='DOSE') de <- names(geneList)[1:100] yy <- enrichKEGG(de, pvalueCutoff=0.01) head(yy)

It throws me the following error: image

zellerivo avatar Dec 06 '23 14:12 zellerivo

you can try pvalueCutoff=1

Junyan1996 avatar Dec 07 '23 11:12 Junyan1996

You should provide more information on your R/Bioconductor installation! Are you sure it is up-to-date? That is, using R-4.3.x and Bioconductor 3.18? There have been changes in the KEGG API the last year, and this may explain why it doesn't work for you anymore... It does for me, using the current versions of R/Bioconductor....!

> library(clusterProfiler)
> data(geneList, package='DOSE')
> de <- names(geneList)[1:100]
> yy <- enrichKEGG(de, pvalueCutoff=0.01)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> head(yy)
                   category           subcategory       ID
hsa04110 Cellular Processes Cell growth and death hsa04110
hsa04218 Cellular Processes Cell growth and death hsa04218
hsa04114 Cellular Processes Cell growth and death hsa04114
hsa04814 Cellular Processes         Cell motility hsa04814
hsa04657 Organismal Systems         Immune system hsa04657
                     Description GeneRatio  BgRatio       pvalue     p.adjust
hsa04110              Cell cycle     12/58 157/8644 3.667200e-10 4.547329e-08
hsa04218     Cellular senescence      7/58 156/8644 7.570813e-05 4.693904e-03
hsa04114          Oocyte meiosis      6/58 131/8644 2.292076e-04 8.823322e-03
hsa04814          Motor proteins      7/58 193/8644 2.846233e-04 8.823322e-03
hsa04657 IL-17 signaling pathway      5/58  94/8644 3.972218e-04 9.851100e-03
               qvalue
hsa04110 4.207630e-08
hsa04218 4.343256e-03
hsa04114 8.164194e-03
hsa04814 8.164194e-03
hsa04657 9.115195e-03
                                                             geneID Count
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319    12
hsa04218                          2305/4605/9133/890/983/51806/1111     7
hsa04114                               991/9133/983/4085/51806/6790     6
hsa04814                     9493/1062/81930/3832/3833/146909/10112     7
hsa04657                                   4312/6280/6279/6278/3627     5
> 
> packageVersion("clusterProfiler")
[1] ‘4.10.0’
> BiocManager::version()
[1] ‘3.18’
> R.Version()$version.string 
[1] "R version 4.3.0 (2023-04-21 ucrt)"
>

guidohooiveld avatar Dec 10 '23 16:12 guidohooiveld

thanks for your answers. Setting a higher pvalue threshold still yields no enriched KEGG Terms, enrichGO works normally.


> packageVersion("clusterProfiler")
[1] ‘4.4.4’
> BiocManager::version()
[1] ‘3.15’
> R.Version()$version.string
[1] "R version 4.2.3 (2023-03-15 ucrt)"

zellerivo avatar Dec 11 '23 08:12 zellerivo

As said, AFAIK recently (a couple of months ago) there have been some issues with connecting to the KEGG API. These have been addressed, so I strongly recommend you update your R/Bioconductor/clusterProfiler installations to the latest ones.

Based on the behavior you experience (GO analysis is working, KEGG is not), it seems it is specific to KEGG, and this can only be the step in which the gene sets are retrieved (because under the hood enrichGO and enrichKEGG converge to the same internal function).

guidohooiveld avatar Dec 11 '23 09:12 guidohooiveld

I did some updating:

> packageVersion("clusterProfiler")
[1] ‘4.10.0’
> BiocManager::version()
[1] ‘3.18’
> R.Version()$version.string
[1] "R version 4.3.2 (2023-10-31 ucrt)"

However, the problem persists. From debugging, it seems to me that there is something going wrong when building the KEGG_DATA object. The path2gene that goes into build_anno appears to be empty, but the path2name parameter looks legit.

zellerivo avatar Dec 11 '23 15:12 zellerivo

This work around procedure fixed it for me: (https://github.com/YuLab-SMU/clusterProfiler/issues/561#issuecomment-1467266614)

zellerivo avatar Dec 11 '23 15:12 zellerivo

Nice to hear it is working for you now!, but... the KEGG_DATA object would normally not be required, except if there are problems connecting to the online KEGG site/database.

Could you therefore, in a fresh session of R, run the code in the 3rd post, and paste the full code and output here?

guidohooiveld avatar Dec 12 '23 10:12 guidohooiveld

still, the same error. I don't get how you get the prompt

Reading KEGG annotation online:

It should be from this function clusterProfiler:::kegg_rest. Where does the call happen? I don't see it in download_KEGG

zellerivo avatar Dec 12 '23 17:12 zellerivo