datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Improving PanCancer Atlas data

Open jjgao opened this issue 6 years ago • 10 comments

  • [x] survival/followup data - https://pubmed.ncbi.nlm.nih.gov/29625055/
  • [ ] clinical data #294
  • [x] treatment timeline data
  • [ ] methylation
  • [x] rppa
  • [ ] structural variants
  • [ ] miRNA #1220
  • [ ] immunologenomics
  • [x] microbiome: #1095
  • [x] cptac #1235
  • [ ] ATACseq: #1158
  • [x] rnaseq for normal samples #1221
  • [x] msi #1043
  • [x] tmb #426
  • [x] re-curate arm-level CNA #1308
  • [ ] ancestry data #1426

jjgao avatar Jun 16 '18 23:06 jjgao

@ritikakundra please prioritize this one when you have a chance.

jjgao avatar Jan 17 '19 21:01 jjgao

  • miRNA: Original Publication: https://gdc.cancer.gov/node/977 "The expression data for mRNA and miRNA were batch-corrected to adjust for platform differences between the GAII and HiSeq Illumina sequencers. " "The data matrix contained abundance profiles for 10,170 tumor samples." image File download link: https://api.gdc.cancer.gov/data/1c6174d9-8ffb-466e-b5ee-07b204c15cf8 (sample ID vs. genes)

  • RPPA: https://gdc.cancer.gov/node/977 image "Protein expression data were available for 7,858 samples from 32 of the 33 tumor types (LAML data were never generated) across 216 proteins and phosphoproteins. The data were generated using the reverse phase protein array (RPPA) platform." File download link: https://api.gdc.cancer.gov/data/fcbb373e-28d4-4818-92f3-601ede3da5e1 (sample ID, tumor type vs. genes)

  • methylation: https://gdc.cancer.gov/node/977 HumanMethylation27 (HM27) and HumanMethylation450 (HM450), were merged to generate a dataset for 22,601 probes shared between two platforms. To minimize systematic platform-specific effects, we normalized the HM27 data against the HM450 data using a probe-by-probe proportional rescaling method. *We're using the merged version. Screen Shot 2019-06-07 at 4 49 02 PM File download link: https://api.gdc.cancer.gov/data/d82e2c44-89eb-43d9-b6d3-712732bf6a53 (sample ID vs probe)

  • immunologenomics: https://gdc.cancer.gov/about-data/publications/panimmune

  • fusion: *to be curated into SV format Source 1: Paper for the analysis: https://www.sciencedirect.com/science/article/pii/S2211124718303954?via%3Dihub File download link: https://api.gdc.cancer.gov/data/06a124df-fa5b-4f2d-8bfa-0e73b685f222 File format: sample_ID(long barcode) vs. fusion vs. breakpoint(chromosome and coordinates) Source 2: (keep in separate file) https://www.tumorfusions.org/

yichaoS avatar Jun 07 '19 17:06 yichaoS

adding microbiome data https://github.com/cBioPortal/datahub/issues/1095 to the list above.

jjgao avatar May 04 '20 22:05 jjgao

@jjgao @ritikakundra An old issue that might still be relative: https://github.com/cBioPortal/datahub/issues/294

yichaoS avatar Jun 04 '20 18:06 yichaoS

discussion related to methylation: https://github.com/cBioPortal/datahub/issues/1210

jjgao avatar Aug 13 '20 18:08 jjgao

added msi and tmb to the list

jjgao avatar Aug 13 '20 18:08 jjgao

added the cptac data to the list.

jjgao avatar Aug 25 '20 19:08 jjgao

added #1308 - re-curate arm-level CNA.

jjgao avatar Oct 22 '20 17:10 jjgao

added https://github.com/cBioPortal/datahub/issues/1426 - ancestry data.

jjgao avatar Jun 08 '21 19:06 jjgao

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 18 '22 23:01 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 30 '22 02:10 stale[bot]

This can be followed up in #1796

sbabyanusha avatar Jan 31 '24 19:01 sbabyanusha