TCGAbiolinks
TCGAbiolinks copied to clipboard
GDCquery with no access to Gene Level Copy Number Scores
Hi!,
3 weeks ago I downloaded some copy number data using the following code:
STAD<-GDCquery(project = "TCGA-STAD", data.category = "Copy number variation", data.type = "Gene Level Copy Number Scores")
Now, I'm trying tor repeat the analysis and suddenly it complains now about the data category. Any thoughts?
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
Error in checkDataCategoriesInput(project, data.category, legacy) :
Please set a valid data.category argument from the column data_category above. We could not validade the data.category for project TCGA-STAD
Hi,
Yes, GDC removed Gene Level Copy Number Scores
data from the website. We only have the following data types now.
![Screen Shot 2022-04-19 at 11 52 44 AM](https://user-images.githubusercontent.com/145529/164045011-2eb56286-5e6e-4798-b690-a89637a8545a.png)
You can find more info in GDC documentation: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/ https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#whole-genome-sequencing-variant-calling
Thanks, In the same issue, when I do
query4<-GDCquery(project = "TCGA-UCEC", data.category = "Copy Number Variation")
getResults(query4)
I still see among data types the following "Gene Level Copy Number", but when I tried the GDCquery with that data type is not working either.
query5 <-GDCquery(project = "TCGA-STAD",data.category = "Copy Number Variation",data.type ="Gene Level Copy Number")
Is this datatype also removed? from the GDC documentation they have the gene level ASCAT values but again, I cant access them through GDCquery.
Error in checkDataTypeInput(legacy = legacy, data.type = data.type) :
Please set a data.type argument from the column harmonized.data.type above
Can you check you have the latest version from GitHub:
You can install with the following commands:
BiocManager::install("BioinformaticsFMRP/TCGAbiolinksGUI.data")
BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")
Restart R and run:
STAD <- GDCquery(
project = "TCGA-STAD",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number"
)
GDCdownload(STAD,files.per.chunk = 50)
gene.level.copy.number <- GDCprepare(STAD)
Hi I am also facing the same issue:
query <- GDCquery(project = "TCGA-GBM",
data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number",
access="open",
legacy = F)
GDCdownload(query)
cnv_data <- GDCprepare(query)
> cnv_data
class: RangedSummarizedExperiment
dim: 60623 542
metadata(1): data_release
assays(3): copy_number min_copy_number max_copy_number
rownames(60623): ENSG00000223972.5 ENSG00000227232.5 ...
ENSG00000182484.15_PAR_Y ENSG00000227159.8_PAR_Y
rowData names(2): gene_id gene_name
colnames(542): TCGA-12-0615-01A-01D-0310-01,TCGA-12-0615-10A-01D-0310-01
TCGA-14-1456-01B-01D-0784-01,TCGA-14-1456-10A-01D-0784-01 ...
TCGA-06-0133-01A-02D-0214-01,TCGA-06-0133-10A-01D-0214-01
TCGA-06-0140-01A-01D-0214-01,TCGA-06-0140-10A-01D-0214-01
colData names(108): barcode patient ...
paper_Telomere.length.estimate.in.blood.normal..Kb.
paper_Telomere.length.estimate.in.tumor..Kb.
How do I extract the copy_number information from this? I also cannot see the information on cytoband
- which was available before and is useful.
You should be able to access the information with the code below:
library(SummarizedExperiment)
info <- rowRanges(cnv_data)
copy_number <- SummarizedExperiment::assay(cnv_data,"copy_number")
min_copy_number <- SummarizedExperiment::assay(cnv_data,"min_copy_number")
max_copy_number <- SummarizedExperiment::assay(cnv_data,"max_copy_number")
You can also set summarizedExperiment = F
Thank you @tiagochst - any idea how to access cytoband info or should I get that from a separate resource like biomart?