cancer-data
cancer-data copied to clipboard
exploring the data
An issue has been raised in the meeting today regarding visualizations of the clinical data. Other data viz are also considered. However, more immediately, we need viz schemes of the clinical data for assessments and covariate selection.
@Inquisitive-Geek is interested in resolving this issue
Approach: seaborn/matplotlib in jupyter notebook
Potential visualizations:
Clinical:
- Prevalence of tumor sites amongst samples
- 'Time to event' distribution
- Other variables of interest?
Sequencing (HiSeqV2):
- Examine for batch effects? (potentially link to clinical matrix contributing variables)
Mutation:
- Prevalence of mutation types
- Number of mutations/sample ID
- Most and least mutated genes
Feel free to add your own suggestions below!
my only worry with seaborn is that it is very memory heavy. However, for the scale of data, I suppose it will be ok.