rnaseq
rnaseq copied to clipboard
Enrich DESeq QC output
Description of feature
Description of feature
I'm putting together a draft PR and @drpatelh pointed out that it's probably best to get feedback from the community before finalizing. First, a bit of background: rather than re-running tximport myself, I now generally just import star_salmon/deseq2_qc/deseq2.dds.RData for simplicity (and to avoid re-remembering what scaling method to use). However, in order to construct my final DESeq dataset, this generally requires extracting the colData and counts, joining the colData with my full metadata, and re-constructing the dataset. I have written a draft already that should give a full impression of how I would like to implement, but it will likely need a bit of touch-up.
I would like to:
Have all metadata supplied by --input land in the final dds object for ease of use downstream. At present the samplesheet tolerates extra columns, but ignores them. This would make the transition to analysis seamless, and may enable better integration with [differentialabundance](https://github.com/nf-core/differentialabundance). This then also enables more meaningful QC:
We can now color the PCA by a specific factor for a nice first-pass confirmation that the experiment worked. The column of interest would have to be passed by an additional flag (deseq2_group_col). By default the way I wrote it it will default to sample if no flag is given.
Add a color annotation using this same column to the distance matrix heatmap for the same purpose
(unrelated to above, just a resource saver): Use vst() rather than varianceStabilizingTransformation() to save a bit of overhead without any real loss (normalizes using just a representative gene subset, rather than all genes). This is generally recommended over varianceStabilizingTransformation() [biostars post](https://www.biostars.org/p/459013/) and [DESeq2 vignette](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization).
To be clear, I'm fairly new to nextflow, so I will need a bit of help cleaning up the PR before submission. Thank you for your help and feedback!
Current state: https://github.com/nf-core/rnaseq/compare/master...RoganGrant:rnaseq:master