TCGAbiolinks
TCGAbiolinks copied to clipboard
'Result would be too long a vector' -- running GE KM analysis on BRCA example in manual
Hi,
I am running the 'Preprocessing of Gene Expression data (IlluminaHiSeq_RNASeqV2)' and 'TCGAanalyze_SurvivalKM: Correlating gene expression and Survival Analysis' R-commands as-is from the Bioconductor page for TCGAbiolinks.
However, I run into the following error when running this command (as-is, from the manual) in R-studio.
for( i in 1: round(nrow(dataBRCAcomplete)/100)){ message( paste( i, "of ", round(nrow(dataBRCAcomplete)/100))) tokenStart <- tokenStop tokenStop <-100*i tabSurvKM<-TCGAanalyze_SurvivalKM(clinical_patient_Cancer, dataBRCAcomplete, Genelist = rownames(dataBRCAcomplete)[tokenStart:tokenStop], Survresult = F, ThreshTop=0.67, ThreshDown=0.33)
tabSurvKMcomplete <- rbind(tabSurvKMcomplete,tabSurvKM)
}
ERROR: Error in 1:lastelementTOP : result would be too long a vector In addition: Warning message: In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) : no non-missing arguments to max; returning -Inf
I wish to run this for ~500 patients across the transcriptome.
Thank you for your guidance! F
Hi, anyone figure out why the:
[SNIP] 754.2753.2752.2751.2750.2749.Error in 1:lastelementTOP : result would be too long a vector
In addition: Warning message:
In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) :
no non-missing arguments to max; returning -Inf
would happen by trying the BRCA example in the manual ?
https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/casestudy.html
group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin[1:100,],
dataGE = dataFilt,
Genelist = rownames(dataDEGs),
Survresult = FALSE,
ThreshTop = 0.67,
ThreshDown = 0.33,
p.cut = 0.05, group1, group2)
All the steps work until this part.
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3
other attached packages:
[1] dnet_1.1.4 supraHex_1.20.0 hexbin_1.27.2 igraph_1.2.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[7] BiocParallel_1.16.6 matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[13] S4Vectors_0.20.1 BiocGenerics_0.28.0 DT_0.5 dplyr_0.8.0.1 TCGAbiolinks_2.10.5
Thanks
Hi,
Still looking for answer.
Thanks
Hi, anyone figure out why the:
[SNIP] 754.2753.2752.2751.2750.2749.Error in 1:lastelementTOP : result would be too long a vector In addition: Warning message: In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) : no non-missing arguments to max; returning -Inf
would happen by trying the BRCA example in the manual ?
https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/casestudy.html group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT")) group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP")) dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin[1:100,], dataGE = dataFilt, Genelist = rownames(dataDEGs), Survresult = FALSE, ThreshTop = 0.67, ThreshDown = 0.33, p.cut = 0.05, group1, group2)
All the steps work until this part.
> sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.3 other attached packages: [1] dnet_1.1.4 supraHex_1.20.0 hexbin_1.27.2 igraph_1.2.4.1 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 [7] BiocParallel_1.16.6 matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0 [13] S4Vectors_0.20.1 BiocGenerics_0.28.0 DT_0.5 dplyr_0.8.0.1 TCGAbiolinks_2.10.5
Thanks I met the same issue with you when I run the TCGA-BRCA sample.
I'm not alone !!!! :)
Same error for me and its >2 years later
Hi! Got the same issue. I explored a little bit more the function and the problem is the p.cut argument. The genelist argument takes all your rownames (ie all the genes) but the p.cut tells only to keep the p<0.05 ones. Here is the conflict, I don't know why but the link between p.cut and genelist seems to be broken.
I solved the issue by using p.cut = 1 every time.
If your keep rownames(dataDEG) for your genelist, you will end up with a big table of all genes that you can later subset using
tabSurvKMcomplete <- tabSurvKMcomplete[tabSurvKMcomplete$pvalue < 0.01,]
If not, just use your own gene list, the argument takes a chr vector.
I got the same error. `> sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TCGAbiolinks_2.25.2 SummarizedExperiment_1.27.1 Biobase_2.57.1
[4] GenomicRanges_1.49.0 GenomeInfoDb_1.33.3 IRanges_2.31.0
[7] S4Vectors_0.35.1 BiocGenerics_0.43.0 MatrixGenerics_1.9.1
[10] matrixStats_0.62.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.2 usethis_2.1.6
[4] devtools_2.4.4 bit64_4.0.5 filelock_1.0.2
[7] progress_1.2.2 httr_1.4.3 rprojroot_2.0.3
[10] tools_4.2.1 profvis_0.3.7 utf8_1.2.2
[13] R6_2.5.1 DBI_1.1.3 colorspace_2.0-3
[16] urlchecker_1.0.1 withr_2.5.0 tidyselect_1.1.2
[19] prettyunits_1.1.1 processx_3.7.0 bit_4.0.4
[22] curl_4.3.2 compiler_4.2.1 rvest_1.0.2
[25] cli_3.3.0 xml2_1.3.3 DelayedArray_0.23.0
[28] scales_1.2.0 readr_2.1.2 callr_3.7.1
[31] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.29
[34] XVector_0.37.0 pkgconfig_2.0.3 htmltools_0.5.2
[37] sessioninfo_1.2.2 dbplyr_2.2.1 fastmap_1.1.0
[40] htmlwidgets_1.5.4 rlang_1.0.4 rstudioapi_0.13
[43] RSQLite_2.2.14 shiny_1.7.1 generics_0.1.3
[46] jsonlite_1.8.0 dplyr_1.0.9 RCurl_1.98-1.7
[49] magrittr_2.0.3 GenomeInfoDbData_1.2.8 Matrix_1.4-1
[52] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
[55] lifecycle_1.0.1 stringi_1.7.6 zlibbioc_1.43.0
[58] plyr_1.8.7 BiocFileCache_2.5.0 pkgbuild_1.3.1
[61] grid_4.2.1 blob_1.2.3 promises_1.2.0.1
[64] crayon_1.5.1 miniUI_0.1.1.1 lattice_0.20-45
[67] splines_4.2.1 Biostrings_2.65.1 hms_1.1.1
[70] KEGGREST_1.37.3 knitr_1.39 ps_1.7.1
[73] pillar_1.7.0 TCGAbiolinksGUI.data_1.17.0 biomaRt_2.53.2
[76] pkgload_1.3.0 XML_3.99-0.10 glue_1.6.2
[79] downloader_0.4 data.table_1.14.2 remotes_2.4.2
[82] vctrs_0.4.1 png_0.1-7 tzdb_0.3.0
[85] httpuv_1.6.5 tidyr_1.2.0 gtable_0.3.0
[88] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.6
[91] ggplot2_3.3.6 xfun_0.31 mime_0.12
[94] xtable_1.8-4 later_1.3.0 survival_3.3-1
[97] tibble_3.1.7 AnnotationDbi_1.59.1 memoise_2.0.1
[100] ellipsis_0.3.2
`
In my case, it is because some genes have infinite values in all columns. It works after removing them.