archive-scater
archive-scater copied to clipboard
Refactor SCESet to use SummarizedExperiment rather than ExpressionSet
Recommended after speaking with Martin Morgan.
wow, that's big...
Probably something to hold off until BioC 3.6, I think. In many respects, it's a pretty straightforward conversion, based on my knowledge of both classes; but we've already made a whole heap of interface changes, and putting this monster in the same release would annoy downstream users/developers.
There are also a bunch of other issues that should be addressed in this conversion:
- We should consolidate
fpkmData
andtpmData
andcpmData
into a single entry, probably calledcpmData
. They are effectively the same thing: normalized, unlogged expression values. - We need to kill
.exprs_hunter
. Implicit choice of expression values is opaque, functions (or people who call them) should define the desired expression type explicitly. - Does anyone actually need
is_exprs
? After QC, there is no compelling reason to keep such a large matrix around, especially when it can be quickly regenerated.
On a related note, we should consider whether we should actually generate exprs
in newSCESet
. Many people are forgetting to call normalize
after running computeSumFactors
in the scran workflow, and this does not cause obvious problems downstream as the exprs
(by library size) are already available. It may be better to ask users to explicitly call normalize
on the constructed SCESet
, with an appropriately shouty warning if size factors are not available.