TCGAbiolinks icon indicating copy to clipboard operation
TCGAbiolinks copied to clipboard

Splice Junction files retrieval

Open elenichri opened this issue 3 years ago • 1 comments

Hello, I wanted to download Splice Junction files for the Primary Tumors of the TCGA-GBM project. I am runnin g R 4.0.0. I used the command query <- GDCquery(project = "TCGA-GBM", data.category = "Transcriptome Profiling", data.type = 'Splice Junction Quantification', legacy = FALSE, sample.type = "Primary Tumor" )

and the result was

GDCquery: Searching in GDC database Genome of reference: hg38 Accessing GDC. This might take a while... Project: TCGA-GBM Sorry! There is no result for your query. Please check in GDC the data available or if there is no error in your query.

SO I guess there are no Splice Junction files for the Harmonized cohort. Therefore I tried the Legacy cohort. Since there is nos Splice Junction value for the expected arguments, I used the Exon Junction Quantification I got all the respective .txt files. I am pasting the top of one of the resulting files:

junction raw_counts chr1:12227:+,chr1:12595:+ 0 chr1:12227:+,chr1:12613:+ 0 chr1:12227:+,chr1:12646:+ 0 chr1:12697:+,chr1:13221:+ 0 chr1:12721:+,chr1:13221:+ 0 chr1:12721:+,chr1:13403:+ 0 chr1:14829:-,chr1:14970:- 691 chr1:14829:-,chr1:15796:- 2 chr1:15038:-,chr1:15796:- 339 chr1:15942:-,chr1:16607:- 0

This is not a typical .SJ.out.tab format, as described on the STAR Fusion manual:

The columns of the SJ.out.tab file have the following meaning: column 1: chromosome column 2: first base of the intron (1-based) column 3: last base of the intron (1-based) column 4: strand (0: undefined, 1: +, 2: -) column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT column 6: 0: unannotated, 1: annotated (only if splice junctions database is used) column 7: number of uniquely mapping reads crossing the junction column 8: number of multi-mapping reads crossing the junction column 9: maximum spliced alignment overhang

I am wondering if the Exon Junction Quantification files are really the ones I need. If not, how can I get Splice Junction Files for the TCGA-GBM project?

Thank you very much!

elenichri avatar Jul 08 '21 08:07 elenichri

Hi,

I believe these are the files you want. But Splice Junction is controlled data and it is not available for TCGA data.

Screen Shot 2021-07-16 at 2 19 54 PM

tiagochst avatar Jul 16 '21 18:07 tiagochst