datahub
datahub copied to clipboard
Improving PanCancer Atlas data
- [x] survival/followup data - https://pubmed.ncbi.nlm.nih.gov/29625055/
- [ ] clinical data #294
- [x] treatment timeline data
- [ ] methylation
- [x] rppa
- [ ] structural variants
- [ ] miRNA #1220
- [ ] immunologenomics
- [x] microbiome: #1095
- [x] cptac #1235
- [ ] ATACseq: #1158
- [x] rnaseq for normal samples #1221
- [x] msi #1043
- [x] tmb #426
- [x] re-curate arm-level CNA #1308
- [ ] ancestry data #1426
@ritikakundra please prioritize this one when you have a chance.
-
miRNA: Original Publication: https://gdc.cancer.gov/node/977 "The expression data for mRNA and miRNA were batch-corrected to adjust for platform differences between the GAII and HiSeq Illumina sequencers. " "The data matrix contained abundance profiles for 10,170 tumor samples."
File download link: https://api.gdc.cancer.gov/data/1c6174d9-8ffb-466e-b5ee-07b204c15cf8 (sample ID vs. genes)
-
RPPA: https://gdc.cancer.gov/node/977
"Protein expression data were available for 7,858 samples from 32 of the 33 tumor types (LAML data were never generated) across 216 proteins and phosphoproteins. The data were generated using the reverse phase protein array (RPPA) platform." File download link: https://api.gdc.cancer.gov/data/fcbb373e-28d4-4818-92f3-601ede3da5e1 (sample ID, tumor type vs. genes)
-
methylation: https://gdc.cancer.gov/node/977 HumanMethylation27 (HM27) and HumanMethylation450 (HM450), were merged to generate a dataset for 22,601 probes shared between two platforms. To minimize systematic platform-specific effects, we normalized the HM27 data against the HM450 data using a probe-by-probe proportional rescaling method. *We're using the merged version.
File download link: https://api.gdc.cancer.gov/data/d82e2c44-89eb-43d9-b6d3-712732bf6a53 (sample ID vs probe)
-
immunologenomics: https://gdc.cancer.gov/about-data/publications/panimmune
-
fusion: *to be curated into SV format Source 1: Paper for the analysis: https://www.sciencedirect.com/science/article/pii/S2211124718303954?via%3Dihub File download link: https://api.gdc.cancer.gov/data/06a124df-fa5b-4f2d-8bfa-0e73b685f222 File format: sample_ID(long barcode) vs. fusion vs. breakpoint(chromosome and coordinates) Source 2: (keep in separate file) https://www.tumorfusions.org/
adding microbiome data https://github.com/cBioPortal/datahub/issues/1095 to the list above.
@jjgao @ritikakundra An old issue that might still be relative: https://github.com/cBioPortal/datahub/issues/294
discussion related to methylation: https://github.com/cBioPortal/datahub/issues/1210
added msi and tmb to the list
added the cptac data to the list.
added #1308 - re-curate arm-level CNA.
added https://github.com/cBioPortal/datahub/issues/1426 - ancestry data.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This can be followed up in #1796