TCGAbiolinks
TCGAbiolinks copied to clipboard
GDC api problem
hey!
thanks for developing and maintaining TCGAbiolinks it made live so much easier... till yesterday... than i ran into a GDC accessibility issue, which i couldn't resolve based on other suggestions
i am aware that the issue is likely not on the TCGAbiolinks side per se, but would be curious to get your opinion on my problem
i am running the following code as listed in the manual
query <- GDCquery(project = "TCGA-COAD",
data.category = "Clinical",
file.type = "xml",
barcode = c("TCGA-RU-A8FL","TCGA-AA-3972"))
it was running all fine till a few days ago
but since yesterday i am getting the following response
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases,cases.project,center,analysis&size=531&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-COAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Clinical%22]%7D%7D]%7D&format=JSON"
ooo Project: TCGA-COAD
Error: Error in getURL(url, fromJSON, timeout(600), simplifyDataFrame = TRUE): 'getURL()' failed:
URL: https://api.gdc.cancer.gov/files/?pretty=true&expand=cases,cases.project,center,analysis&size=531&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-COAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Clinical%22]%7D%7D]%7D&format=JSON
error: HTTP error 500.
We will retry to access GDC!
Error in if (json$data$pagination$count == 0) { :
argument is of length zero
no response for the following query either
clin.gbm <- GDCquery_clinic("TCGA-GBM", "clinical")
parts of my sessionInfo (worked fine till now)
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TCGAbiolinks_2.11.2
but also tried another installation with similar success
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 7 (wheezy)
other attached packages:
[1] TCGAbiolinks_2.6.12
response:
> query <- GDCquery(project = "TCGA-COAD",
+ data.category = "Clinical",
+ file.type = "xml",
+ barcode = c("TCGA-RU-A8FL","TCGA-AA-3972"))
Error in value[[3L]](cond) :
GDC server down, try to use this package later
i dont see a problem with accessibility, as the following works
curl "https://api.gdc.cancer.gov/analysis/top_cases_counts_by_genes?gene_ids=ENSG00000155657&pretty=true"
so is it the query? is there a general recommendation on how to narrow down the source of the problem?
THANKS for any suggestions!!!!!
Hi! I'm also having the same problem when trying to use given TCGABiolinks example
# You can define a list of samples to query and download providing relative TCGA barcodes.
listSamples <- c("TCGA-E9-A1NG-11A-52R-A14M-07","TCGA-BH-A1FC-11A-32R-A13Q-07",
"TCGA-A7-A13G-11A-51R-A13Q-07","TCGA-BH-A0DK-11A-13R-A089-07",
"TCGA-E9-A1RH-11A-34R-A169-07","TCGA-BH-A0AU-01A-11R-A12P-07",
"TCGA-C8-A1HJ-01A-11R-A13Q-07","TCGA-A7-A13D-01A-13R-A12P-07",
"TCGA-A2-A0CV-01A-31R-A115-07","TCGA-AQ-A0Y5-01A-11R-A14M-07")
# Query platform Illumina HiSeq with a list of barcode
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Gene expression",
data.type = "Gene expression quantification",
experimental.strategy = "RNA-Seq",
platform = "Illumina HiSeq",
file.type = "results",
barcode = listSamples,
legacy = TRUE)
I'm also getting the same error error as you.
Sometimes this problem was solved installing TCGABiolinks directly from GitHub, but that's not solving it for me either.
devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")
I don't know if it's because something is installed wrong or if GDC server is actually down. My session info is
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.32.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[4] BiocParallel_1.16.5 matrixStats_0.54.0 Biobase_2.42.0
[7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0
[10] S4Vectors_0.20.1 BiocGenerics_0.28.0 DT_0.5
[13] dplyr_0.7.8 TCGAbiolinks_2.11.2
loaded via a namespace (and not attached):
[1] backports_1.1.3 circlize_0.4.5 aroma.light_3.12.0
[4] plyr_1.8.4 selectr_0.4-1 ConsensusClusterPlus_1.46.0
[7] lazyeval_0.2.1 splines_3.5.2 usethis_1.4.0
[10] ggplot2_3.1.0 sva_3.30.1 digest_0.6.18
[13] foreach_1.4.4 htmltools_0.3.6 magrittr_1.5
[16] memoise_1.1.0 cluster_2.0.7-1 doParallel_1.0.14
[19] remotes_2.0.2 limma_3.38.3 ComplexHeatmap_1.20.0
[22] Biostrings_2.50.2 readr_1.3.1 annotate_1.60.0
[25] R.utils_2.7.0 prettyunits_1.0.2 colorspace_1.4-0
[28] blob_1.1.1 rvest_0.3.2 ggrepel_0.8.0
[31] xfun_0.4 callr_3.1.1 crayon_1.3.4
[34] RCurl_1.95-4.11 jsonlite_1.6 genefilter_1.64.0
[37] bindr_0.1.1 survival_2.43-3 zoo_1.8-4
[40] iterators_1.0.10 glue_1.3.0 survminer_0.4.3
[43] gtable_0.2.0 zlibbioc_1.28.0 XVector_0.22.0
[46] GetoptLong_0.1.7 pkgbuild_1.0.2 shape_1.4.4
[49] scales_1.0.0 DESeq_1.34.1 DBI_1.0.0
[52] edgeR_3.24.3 ggthemes_4.0.1 Rcpp_1.0.0
[55] xtable_1.8-3 progress_1.2.0 cmprsk_2.2-7
[58] bit_1.1-14 matlab_1.0.2 km.ci_0.5-2
[61] htmlwidgets_1.3 httr_1.4.0 RColorBrewer_1.1-2
[64] pkgconfig_2.0.2 XML_3.98-1.16 R.methodsS3_1.7.1
[67] locfit_1.5-9.1 tidyselect_0.2.5 rlang_0.3.1
[70] AnnotationDbi_1.44.0 munsell_0.5.0 tools_3.5.2
[73] downloader_0.4 cli_1.0.1 generics_0.0.2
[76] RSQLite_2.1.1 devtools_2.0.1 broom_0.5.1
[79] stringr_1.3.1 yaml_2.2.0 fs_1.2.6
[82] processx_3.2.1 knitr_1.21 bit64_0.9-7
[85] survMisc_0.5.5 purrr_0.2.5 bindrcpp_0.2.2
[88] EDASeq_2.16.3 nlme_3.1-137 R.oo_1.22.0
[91] xml2_1.2.0 biomaRt_2.39.2 compiler_3.5.2
[94] rstudioapi_0.9.0 curl_3.3 tibble_2.0.1
[97] geneplotter_1.60.0 stringi_1.2.4 ps_1.3.0
[100] desc_1.2.0 GenomicFeatures_1.34.1 lattice_0.20-38
[103] Matrix_1.2-15 KMsurv_0.1-5 pillar_1.3.1
[106] BiocManager_1.30.4 GlobalOptions_0.1.0 data.table_1.12.0
[109] bitops_1.0-6 rtracklayer_1.42.1 R6_2.3.0
[112] latticeExtra_0.6-28 hwriter_1.3.2 ShortRead_1.40.0
[115] gridExtra_2.3 sessioninfo_1.1.1 codetools_0.2-16
[118] pkgload_1.0.2 assertthat_0.2.0 rprojroot_1.3-2
[121] rjson_0.2.20 withr_2.1.2 GenomicAlignments_1.18.1
[124] Rsamtools_1.34.0 GenomeInfoDbData_1.2.0 mgcv_1.8-26
[127] hms_0.4.2 grid_3.5.2 tidyr_0.8.2
[130] ggpubr_0.2
GDC might have a problem with the legacy server, any query with more than 150 samples is giving "internal error". I'll send an email to GDC.
I'm having a similar issue upon executing the following code. Any help is appreciated!
query_harmonized <- GDCquery(project = "TCGA-SKCM",
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
legacy = FALSE)
Output:
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-SKCM
Error: Error in getURL(url, fromJSON, timeout(600), simplifyDataFrame = TRUE): 'getURL()' failed:
URL: https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2320&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-SKCM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Transcriptome%20Profiling%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22miRNA%20Expression%20Quantification%22]%7D%7D]%7D&format=JSON
error: HTTP error 500.
We will retry to access GDC!
Error in if (json$data$pagination$count == 0) { :
argument is of length zero
My session info
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.0.0 openxlsx_4.1.0 survminer_0.4.3 ggpubr_0.2 magrittr_1.5 ggplot2_3.1.0 survival_2.43-3
[8] dplyr_0.7.8 tibble_2.0.1 TCGAbiolinks_2.11.2
loaded via a namespace (and not attached):
[1] backports_1.1.3 circlize_0.4.5 aroma.light_3.10.0 plyr_1.8.4 selectr_0.4-1
[6] ConsensusClusterPlus_1.45.0 lazyeval_0.2.1 splines_3.5.2 BiocParallel_1.15.7 usethis_1.4.0
[11] GenomeInfoDb_1.17.2 sva_3.29.1 digest_0.6.18 foreach_1.4.4 memoise_1.1.0
[16] cluster_2.0.7-1 doParallel_1.0.14 limma_3.37.7 remotes_2.0.2 ComplexHeatmap_1.19.2
[21] Biostrings_2.48.0 readr_1.3.1 annotate_1.58.0 matrixStats_0.54.0 R.utils_2.7.0
[26] prettyunits_1.0.2 colorspace_1.4-0 blob_1.1.1 rvest_0.3.2 ggrepel_0.8.0
[31] xfun_0.4 callr_3.1.1 crayon_1.3.4 RCurl_1.95-4.11 jsonlite_1.6
[36] genefilter_1.63.2 bindr_0.1.1 zoo_1.8-4 iterators_1.0.10 glue_1.3.0
[41] gtable_0.2.0 zlibbioc_1.26.0 XVector_0.20.0 GetoptLong_0.1.7 DelayedArray_0.6.6
[46] pkgbuild_1.0.2 shape_1.4.4 BiocGenerics_0.27.1 DESeq_1.32.0 DBI_1.0.0
[51] edgeR_3.23.5 ggthemes_4.0.1 Rcpp_1.0.0 xtable_1.8-3 progress_1.2.0
[56] cmprsk_2.2-7 bit_1.1-14 matlab_1.0.2 km.ci_0.5-2 stats4_3.5.2
[61] httr_1.4.0 RColorBrewer_1.1-2 pkgconfig_2.0.2 XML_3.98-1.16 R.methodsS3_1.7.1
[66] locfit_1.5-9.1 tidyselect_0.2.5 rlang_0.3.1 AnnotationDbi_1.42.1 munsell_0.5.0
[71] tools_3.5.2 downloader_0.4 cli_1.0.1 generics_0.0.2 RSQLite_2.1.1
[76] devtools_2.0.1 broom_0.5.1 stringr_1.3.1 yaml_2.2.0 processx_3.2.1
[81] knitr_1.21 bit64_0.9-7 fs_1.2.6 zip_1.0.0 survMisc_0.5.5
[86] purrr_0.2.5 bindrcpp_0.2.2 EDASeq_2.15.4 nlme_3.1-137 R.oo_1.22.0
[91] xml2_1.2.0 biomaRt_2.37.6 compiler_3.5.2 rstudioapi_0.9.0 curl_3.2
[96] testthat_2.0.1 geneplotter_1.58.0 stringi_1.2.4 highr_0.7 ps_1.3.0
[101] GenomicFeatures_1.33.2 desc_1.2.0 lattice_0.20-38 Matrix_1.2-15 KMsurv_0.1-5
[106] pillar_1.3.1 GlobalOptions_0.1.0 data.table_1.12.0 bitops_1.0-6 rtracklayer_1.40.6
[111] GenomicRanges_1.33.14 R6_2.3.0 latticeExtra_0.6-28 hwriter_1.3.2 ShortRead_1.38.0
[116] gridExtra_2.3 IRanges_2.15.18 sessioninfo_1.1.1 codetools_0.2-16 assertthat_0.2.0
[121] pkgload_1.0.2 SummarizedExperiment_1.11.6 rprojroot_1.3-2 rjson_0.2.20 withr_2.1.2
[126] GenomicAlignments_1.16.0 Rsamtools_1.32.3 S4Vectors_0.19.20 GenomeInfoDbData_1.1.0 mgcv_1.8-26
[131] parallel_3.5.2 hms_0.4.2 grid_3.5.2 tidyr_0.8.2 Biobase_2.41.2
When I try to access the URL in a browser this is what I get:
{
"message": "internal server error"
}
same issue
queryClin <- GDCquery(project = "TCGA-LUAD",
data.category = "Clinical",
file.type = "xml")
GDCquery: Searching in GDC database
Genome of reference: hg38
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-LUAD
Error: Error in getURL(url, fromJSON, timeout(600), simplifyDataFrame = TRUE): 'getURL()' failed:
URL: https://api.gdc.cancer.gov/files/?pretty=true&expand=cases,cases.project,center,analysis&size=623&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LUAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Clinical%22]%7D%7D]%7D&format=JSON
error: HTTP error 500.
We will retry to access GDC!
Error in if (json$data$pagination$count == 0) { :
@tiagochst
The issue with still persists. Can you please let me know if this is still from GDC. Thanks
cnv.query <- GDCquery(project = "TCGA-COAD", data.category = "Copy number variation")
Error in value[[3L]](cond) :
GDC server down, try to use this package later
@rjg2186 It seems the package cannot access GDC server in your machine. There could be several reasons not related to our package. Please, could you post your sessionInfo()
so we can have more information ?
Also, for Copy Number data you need to specify the data.type
.
library(TCGAbiolinks)
# 1) Get GISTIC
query <- GDCquery(project = "TCGA-COAD",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number Scores",
access="open")
GDCdownload(query)
gistic <- GDCprepare(query)
# 2) Get Copy Number Segment
query <- GDCquery(project = "TCGA-COAD",
data.category = "Copy Number Variation",
data.type = "Copy Number Segment")
GDCdownload(query)
cns <- GDCprepare(query)
query <- GDCquery(project = "TCGA-COAD",
data.category = "Copy Number Variation",
data.type = "Masked Copy Number Segment")
GDCdownload(query)
cns.masked <- GDCprepare(query)
I have same issue and I have Windows OS. I removed bioconducter package and I tried to install devtools github. but I have an error " Error in i.p(...) : (converted from warning) installation of package ‘C:/~/AppData/Local/Temp/RtmpgDZJjm/file4bd05fa2264a/TCGAbiolinks_2.11.6.tar.gz’ had non-zero exit status.
I could not solve this problem.
`> sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale: [1] LC_COLLATE=Turkish_Turkey.1254 LC_CTYPE=Turkish_Turkey.1254 LC_MONETARY=Turkish_Turkey.1254 [4] LC_NUMERIC=C LC_TIME=Turkish_Turkey.1254
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] TCGAbiolinks_2.10.5
loaded via a namespace (and not attached):
[1] backports_1.1.4 circlize_0.4.6 AnnotationHub_2.14.5
[4] aroma.light_3.12.0 plyr_1.8.4 selectr_0.4-1
[7] ConsensusClusterPlus_1.46.0 lazyeval_0.2.2 splines_3.5.3
[10] BiocParallel_1.16.6 usethis_1.5.0 GenomeInfoDb_1.18.2
[13] ggplot2_3.1.1 sva_3.30.1 digest_0.6.18
[16] foreach_1.4.4 htmltools_0.3.6 magrittr_1.5
[19] memoise_1.1.0 cluster_2.0.7-1 doParallel_1.0.14
[22] remotes_2.0.4 limma_3.38.3 ComplexHeatmap_1.20.0
[25] Biostrings_2.50.2 readr_1.3.1 annotate_1.60.1
[28] matrixStats_0.54.0 sesameData_1.0.0 R.utils_2.8.0
[31] prettyunits_1.0.2 colorspace_1.4-1 blob_1.1.1
[34] rvest_0.3.3 ggrepel_0.8.0 xfun_0.6
[37] dplyr_0.8.0.1 callr_3.2.0 crayon_1.3.4
[40] RCurl_1.95-4.12 jsonlite_1.6 genefilter_1.64.0
[43] zoo_1.8-5 survival_2.43-3 iterators_1.0.10
[46] glue_1.3.1 survminer_0.4.3 gtable_0.3.0
[49] sesame_1.0.0 zlibbioc_1.28.0 XVector_0.22.0
[52] GetoptLong_0.1.7 DelayedArray_0.8.0 pkgbuild_1.0.3
[55] wheatmap_0.1.0 shape_1.4.4 BiocGenerics_0.28.0
[58] scales_1.0.0 DESeq_1.34.1 DBI_1.0.0
[61] edgeR_3.24.3 ggthemes_4.1.1 Rcpp_1.0.1
[64] xtable_1.8-4 progress_1.2.0 cmprsk_2.2-7
[67] bit_1.1-14 matlab_1.0.2 preprocessCore_1.44.0
[70] km.ci_0.5-2 stats4_3.5.3 httr_1.4.0
[73] RColorBrewer_1.1-2 pkgconfig_2.0.2 XML_3.98-1.19
[76] R.methodsS3_1.7.1 locfit_1.5-9.1 DNAcopy_1.56.0
[79] tidyselect_0.2.5 rlang_0.3.4 later_0.8.0
[82] AnnotationDbi_1.44.0 munsell_0.5.0 tools_3.5.3
[85] cli_1.1.0 downloader_0.4 generics_0.0.2
[88] RSQLite_2.1.1 ExperimentHub_1.8.0 devtools_2.0.2
[91] broom_0.5.2 stringr_1.4.0 yaml_2.2.0
[94] fs_1.2.7 processx_3.3.0 knitr_1.22
[97] bit64_0.9-8 survMisc_0.5.5 purrr_0.3.2
[100] randomForest_4.6-14 EDASeq_2.16.3 nlme_3.1-137
[103] mime_0.6 R.oo_1.22.0 xml2_1.2.0
[106] biomaRt_2.38.0 compiler_3.5.3 rstudioapi_0.10
[109] curl_3.3 interactiveDisplayBase_1.20.0 tibble_2.1.1
[112] geneplotter_1.60.0 stringi_1.4.3 ps_1.3.0
[115] desc_1.2.0 GenomicFeatures_1.34.8 lattice_0.20-38
[118] Matrix_1.2-15 KMsurv_0.1-5 pillar_1.3.1
[121] BiocManager_1.30.4 GlobalOptions_0.1.0 data.table_1.12.2
[124] bitops_1.0-6 httpuv_1.5.1 rtracklayer_1.42.2
[127] GenomicRanges_1.34.0 R6_2.4.0 latticeExtra_0.6-28
[130] hwriter_1.3.2 promises_1.0.1 ShortRead_1.40.0
[133] gridExtra_2.3 IRanges_2.16.0 sessioninfo_1.1.1
[136] codetools_0.2-16 pkgload_1.0.2 assertthat_0.2.1
[139] SummarizedExperiment_1.12.0 rprojroot_1.3-2 rjson_0.2.20
[142] withr_2.1.2 GenomicAlignments_1.18.1 Rsamtools_1.34.1
[145] S4Vectors_0.20.1 GenomeInfoDbData_1.2.0 mgcv_1.8-27
[148] parallel_3.5.3 hms_0.4.2 grid_3.5.3
[151] tidyr_0.8.3 ggpubr_0.2 Biobase_2.42.0
[154] shiny_1.3.2 `
Any news regarding this issue?