TCGAbiolinks icon indicating copy to clipboard operation
TCGAbiolinks copied to clipboard

GDCquery with no access to Gene Level Copy Number Scores

Open luisgls opened this issue 2 years ago • 7 comments

Hi!,

3 weeks ago I downloaded some copy number data using the following code:

STAD<-GDCquery(project = "TCGA-STAD", data.category = "Copy number variation", data.type = "Gene Level Copy Number Scores")

Now, I'm trying tor repeat the analysis and suddenly it complains now about the data category. Any thoughts?

--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
Error in checkDataCategoriesInput(project, data.category, legacy) : 
  Please set a valid data.category argument from the column data_category above. We could not validade the data.category for project TCGA-STAD

luisgls avatar Apr 19 '22 15:04 luisgls

Hi,

Yes, GDC removed Gene Level Copy Number Scores data from the website. We only have the following data types now.

Screen Shot 2022-04-19 at 11 52 44 AM

You can find more info in GDC documentation: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/ https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#whole-genome-sequencing-variant-calling

tiagochst avatar Apr 19 '22 15:04 tiagochst

Thanks, In the same issue, when I do

query4<-GDCquery(project = "TCGA-UCEC", data.category = "Copy Number Variation")
getResults(query4)

I still see among data types the following "Gene Level Copy Number", but when I tried the GDCquery with that data type is not working either.

query5 <-GDCquery(project = "TCGA-STAD",data.category = "Copy Number Variation",data.type ="Gene Level Copy Number")

Is this datatype also removed? from the GDC documentation they have the gene level ASCAT values but again, I cant access them through GDCquery.

Error in checkDataTypeInput(legacy = legacy, data.type = data.type) : 
  Please set a data.type argument from the column harmonized.data.type above

luisgls avatar Apr 19 '22 16:04 luisgls

Can you check you have the latest version from GitHub:

You can install with the following commands:

BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")
BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")

Restart R and run:

STAD <- GDCquery(
    project = "TCGA-STAD", 
    data.category = "Copy Number Variation", 
    data.type = "Gene Level Copy Number"
)

GDCdownload(STAD,files.per.chunk = 50)
gene.level.copy.number <- GDCprepare(STAD)

tiagochst avatar Apr 19 '22 16:04 tiagochst

Hi I am also facing the same issue:

query <- GDCquery(project = "TCGA-GBM",
                    data.category = "Copy Number Variation",
                    data.type = "Gene Level Copy Number",              
                    access="open", 
                    legacy = F)
  GDCdownload(query)
  cnv_data <- GDCprepare(query)

> cnv_data
class: RangedSummarizedExperiment 
dim: 60623 542 
metadata(1): data_release
assays(3): copy_number min_copy_number max_copy_number
rownames(60623): ENSG00000223972.5 ENSG00000227232.5 ...
  ENSG00000182484.15_PAR_Y ENSG00000227159.8_PAR_Y
rowData names(2): gene_id gene_name
colnames(542): TCGA-12-0615-01A-01D-0310-01,TCGA-12-0615-10A-01D-0310-01
  TCGA-14-1456-01B-01D-0784-01,TCGA-14-1456-10A-01D-0784-01 ...
  TCGA-06-0133-01A-02D-0214-01,TCGA-06-0133-10A-01D-0214-01
  TCGA-06-0140-01A-01D-0214-01,TCGA-06-0140-10A-01D-0214-01
colData names(108): barcode patient ...
  paper_Telomere.length.estimate.in.blood.normal..Kb.
  paper_Telomere.length.estimate.in.tumor..Kb.

How do I extract the copy_number information from this? I also cannot see the information on cytoband - which was available before and is useful.

komalsrathi avatar Apr 28 '22 18:04 komalsrathi

You should be able to access the information with the code below:

library(SummarizedExperiment)
info <- rowRanges(cnv_data)
copy_number <- SummarizedExperiment::assay(cnv_data,"copy_number")
min_copy_number  <- SummarizedExperiment::assay(cnv_data,"min_copy_number")
max_copy_number  <- SummarizedExperiment::assay(cnv_data,"max_copy_number")

tiagochst avatar Apr 28 '22 21:04 tiagochst

You can also set summarizedExperiment = F Screen Shot 2022-04-28 at 5 37 24 PM

tiagochst avatar Apr 28 '22 21:04 tiagochst

Thank you @tiagochst - any idea how to access cytoband info or should I get that from a separate resource like biomart?

komalsrathi avatar Apr 28 '22 21:04 komalsrathi