TCGAbiolinks 'Result would be too long a vector' -- running GE KM analysis on BRCA example in manual

'Result would be too long a vector' -- running GE KM analysis on BRCA example in manual

Open freuv opened this issue 6 years ago • 8 comments

Hi,

I am running the 'Preprocessing of Gene Expression data (IlluminaHiSeq_RNASeqV2)' and 'TCGAanalyze_SurvivalKM: Correlating gene expression and Survival Analysis' R-commands as-is from the Bioconductor page for TCGAbiolinks.

However, I run into the following error when running this command (as-is, from the manual) in R-studio.

for( i in 1: round(nrow(dataBRCAcomplete)/100)){ message( paste( i, "of ", round(nrow(dataBRCAcomplete)/100))) tokenStart <- tokenStop tokenStop <-100*i tabSurvKM<-TCGAanalyze_SurvivalKM(clinical_patient_Cancer, dataBRCAcomplete, Genelist = rownames(dataBRCAcomplete)[tokenStart:tokenStop], Survresult = F, ThreshTop=0.67, ThreshDown=0.33)

tabSurvKMcomplete <- rbind(tabSurvKMcomplete,tabSurvKM)

}

ERROR: Error in 1:lastelementTOP : result would be too long a vector In addition: Warning message: In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) : no non-missing arguments to max; returning -Inf

I wish to run this for ~500 patients across the transcriptome.

Thank you for your guidance! F

Jan 24 '18 17:01 freuv

Hi, anyone figure out why the:

[SNIP] 754.2753.2752.2751.2750.2749.Error in 1:lastelementTOP : result would be too long a vector
In addition: Warning message:
In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) :
  no non-missing arguments to max; returning -Inf

would happen by trying the BRCA example in the manual ?

https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/casestudy.html

group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))

dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin[1:100,],
                                   dataGE = dataFilt,
                                   Genelist = rownames(dataDEGs),
                                   Survresult = FALSE,
                                   ThreshTop = 0.67,
                                   ThreshDown = 0.33,
                                   p.cut = 0.05, group1, group2)

All the steps work until this part.

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

other attached packages:
 [1] dnet_1.1.4                  supraHex_1.20.0             hexbin_1.27.2               igraph_1.2.4.1              SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [7] BiocParallel_1.16.6         matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0             
[13] S4Vectors_0.20.1            BiocGenerics_0.28.0         DT_0.5                      dplyr_0.8.0.1               TCGAbiolinks_2.10.5

Thanks

May 07 '19 21:05 BenoitFiset

Hi,

Still looking for answer.

Thanks

May 21 '19 15:05 BenoitFiset

Hi, anyone figure out why the:

[SNIP] 754.2753.2752.2751.2750.2749.Error in 1:lastelementTOP : result would be too long a vector
In addition: Warning message:
In max(which(mRNAselected_values_ordered > mRNAselected_values_ordered_top)) :
  no non-missing arguments to max; returning -Inf

would happen by trying the BRCA example in the manual ?

https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/casestudy.html

group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))

dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin[1:100,],
                                   dataGE = dataFilt,
                                   Genelist = rownames(dataDEGs),
                                   Survresult = FALSE,
                                   ThreshTop = 0.67,
                                   ThreshDown = 0.33,
                                   p.cut = 0.05, group1, group2)

All the steps work until this part.

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

other attached packages:
 [1] dnet_1.1.4                  supraHex_1.20.0             hexbin_1.27.2               igraph_1.2.4.1              SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
 [7] BiocParallel_1.16.6         matrixStats_0.54.0          Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0             
[13] S4Vectors_0.20.1            BiocGenerics_0.28.0         DT_0.5                      dplyr_0.8.0.1               TCGAbiolinks_2.10.5

Thanks I met the same issue with you when I run the TCGA-BRCA sample.

Jul 01 '19 12:07 cailiangliang765

I'm not alone !!!! :)

Jul 02 '19 15:07 BenoitFiset

Same error for me and its >2 years later

Aug 10 '20 18:08 mherberg

Hi! Got the same issue. I explored a little bit more the function and the problem is the p.cut argument. The genelist argument takes all your rownames (ie all the genes) but the p.cut tells only to keep the p<0.05 ones. Here is the conflict, I don't know why but the link between p.cut and genelist seems to be broken.

I solved the issue by using p.cut = 1 every time.

If your keep rownames(dataDEG) for your genelist, you will end up with a big table of all genes that you can later subset using

tabSurvKMcomplete <- tabSurvKMcomplete[tabSurvKMcomplete$pvalue < 0.01,]

If not, just use your own gene list, the argument takes a chr vector.

Mar 02 '21 11:03 SebRW

I got the same error. `> sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] TCGAbiolinks_2.25.2 SummarizedExperiment_1.27.1 Biobase_2.57.1
[4] GenomicRanges_1.49.0 GenomeInfoDb_1.33.3 IRanges_2.31.0
[7] S4Vectors_0.35.1 BiocGenerics_0.43.0 MatrixGenerics_1.9.1
[10] matrixStats_0.62.0

loaded via a namespace (and not attached): [1] bitops_1.0-7 fs_1.5.2 usethis_2.1.6
[4] devtools_2.4.4 bit64_4.0.5 filelock_1.0.2
[7] progress_1.2.2 httr_1.4.3 rprojroot_2.0.3
[10] tools_4.2.1 profvis_0.3.7 utf8_1.2.2
[13] R6_2.5.1 DBI_1.1.3 colorspace_2.0-3
[16] urlchecker_1.0.1 withr_2.5.0 tidyselect_1.1.2
[19] prettyunits_1.1.1 processx_3.7.0 bit_4.0.4
[22] curl_4.3.2 compiler_4.2.1 rvest_1.0.2
[25] cli_3.3.0 xml2_1.3.3 DelayedArray_0.23.0
[28] scales_1.2.0 readr_2.1.2 callr_3.7.1
[31] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.29
[34] XVector_0.37.0 pkgconfig_2.0.3 htmltools_0.5.2
[37] sessioninfo_1.2.2 dbplyr_2.2.1 fastmap_1.1.0
[40] htmlwidgets_1.5.4 rlang_1.0.4 rstudioapi_0.13
[43] RSQLite_2.2.14 shiny_1.7.1 generics_0.1.3
[46] jsonlite_1.8.0 dplyr_1.0.9 RCurl_1.98-1.7
[49] magrittr_2.0.3 GenomeInfoDbData_1.2.8 Matrix_1.4-1
[52] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
[55] lifecycle_1.0.1 stringi_1.7.6 zlibbioc_1.43.0
[58] plyr_1.8.7 BiocFileCache_2.5.0 pkgbuild_1.3.1
[61] grid_4.2.1 blob_1.2.3 promises_1.2.0.1
[64] crayon_1.5.1 miniUI_0.1.1.1 lattice_0.20-45
[67] splines_4.2.1 Biostrings_2.65.1 hms_1.1.1
[70] KEGGREST_1.37.3 knitr_1.39 ps_1.7.1
[73] pillar_1.7.0 TCGAbiolinksGUI.data_1.17.0 biomaRt_2.53.2
[76] pkgload_1.3.0 XML_3.99-0.10 glue_1.6.2
[79] downloader_0.4 data.table_1.14.2 remotes_2.4.2
[82] vctrs_0.4.1 png_0.1-7 tzdb_0.3.0
[85] httpuv_1.6.5 tidyr_1.2.0 gtable_0.3.0
[88] purrr_0.3.4 assertthat_0.2.1 cachem_1.0.6
[91] ggplot2_3.3.6 xfun_0.31 mime_0.12
[94] xtable_1.8-4 later_1.3.0 survival_3.3-1
[97] tibble_3.1.7 AnnotationDbi_1.59.1 memoise_2.0.1
[100] ellipsis_0.3.2

`

Jul 25 '22 12:07 Odd-i

In my case, it is because some genes have infinite values in all columns. It works after removing them.

Nov 28 '23 22:11 Bellanzixuan

TCGAbiolinks TCGAbiolinks copied to clipboard

'Result would be too long a vector' -- running GE KM analysis on BRCA example in manual

TCGAbiolinks
TCGAbiolinks copied to clipboard