TCGAbiolinks
TCGAbiolinks copied to clipboard
CPTAC-3 GDCprepare error. missing sample.submitter_id
I did
library(TCGAbiolinks)
query_cnv <- GDCquery(
project = "CPTAC-3",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number"
)
GDCdownload(query_cnv)
cnvdf <- GDCprepare(query = query_cnv)
and got the error:
Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL
Debugging GDCprepare found out the error is coming from the line
cases <- ifelse(grepl("TCGA|TARGET|CGCI-HTMCP-CC", query$results[[1]]$project %>%
unlist()), query$results[[1]]$cases, query$results[[1]]$sample.submitter_id)
because there is no sample.submitter_id on the query object:
query_cnv$results[[1]] %>% names
[1] "id" "data_format"
[3] "cases" "access"
[5] "file_name" "submitter_id"
[7] "data_category" "type"
[9] "platform" "file_size"
[11] "created_datetime" "md5sum"
[13] "updated_datetime" "file_id"
[15] "data_type" "state"
[17] "experimental_strategy" "version"
[19] "data_release" "project"
[21] "analysis_id" "analysis_state"
[23] "analysis_submitter_id" "analysis_workflow_link"
[25] "analysis_workflow_type" "analysis_workflow_version"
[27] "sample_type"
Same error here in CPTAC-2,
When I do snp_Query_Data <- GDCquery( project = "CPTAC-2", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation", access = "open" )
GDCdownload(query=snp_Query_Data, method = "api", directory = DataDir, files.per.chunk = 50)
snp_data_CPTAC2 <- GDCprepare(query = snp_Query_Data, directory = DataDir, save = TRUE, save.filename = "CPTAC2_SNP_data.rda")
Got the same error. Does anyone know how to fix it?
Hi,
Thanks a lot for this convenient package.
I'm experiencing the same error with project BEATAML1.0-COHORT
, even after reinstalling TCGAbiolinks
with last updates:
devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "master")
library(TCGAbiolinks)
query <- GDCquery(
project = "BEATAML1.0-COHORT",
data.category = "Simple Nucleotide Variation",
access = "open",
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL
Thank you for the bug report.
Could you try this version: devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "devel")
Best regards, Tiago Chedraoui Silva
On Mon, Nov 27, 2023 at 12:34 PM ChiaraCaprioli @.***> wrote:
Hi,
Thanks a lot for this convenient package. I'm experiencing the same error with project BEATAML1.0-COHORT, even after reinstalling TCGAbiolinks with last updates:
devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "master") library(TCGAbiolinks)
query <- GDCquery( project = "BEATAML1.0-COHORT", data.category = "Simple Nucleotide Variation", access = "open", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking" ) GDCdownload(query) maf <- GDCprepare(query)
Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/573#issuecomment-1828066388, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6JIKR7A2CPH4KB7CM3YGSXHRAVCNFSM6AAAAAAXTKQS3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGA3DMMZYHA . You are receiving this because you commented.Message ID: @.***>
Thank you for the prompt response. It throws the same error.
Please, Could you check the version loaded please. It is working on my side https://rpubs.com/tiagochst/BEATAML_COHORT_MAF
On Mon, Nov 27, 2023 at 3:34 PM ChiaraCaprioli @.***> wrote:
Thank you for the prompt response. It throws the same error.
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/573#issuecomment-1828400767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6PWY7T6ZJS5AZ2P2QTYGTMLLAVCNFSM6AAAAAAXTKQS3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGQYDANZWG4 . You are receiving this because you commented.Message ID: @.***>
After double-checking it works, thank you. However there is another error raised when trying to access maf files to associate mutation to expression data:
project = "BEATAML1.0-COHORT"
query_exp_project <- GDCquery(
project = project,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts"
)
GDCdownload(
query = query_exp_project,
directory = path_main
)
exp_project <- GDCprepare(
query_exp_project,
save = T,
directory = path_main,
save.filename = paste(destdir, paste0(project, "_gex.RData"), sep = "/"),
add.gistic2.mut = "SRSF2" # add info on SRSF2 mutational status
)
Starting to add information to samples
=> Add clinical information to samples
Error in dplyr::bind_cols()
:
! Can't recycle ..1
(size 0) to match ..2
(size 2).
Run rlang::last_trace()
to see where the error occurred.
I understand this is unrelated to the initial bug, however it seems specific to the BeatAML project because that's not happening with TCGA-LAML data.