archive-scater icon indicating copy to clipboard operation
archive-scater copied to clipboard

Refactor SCESet to use SummarizedExperiment rather than ExpressionSet

Open davismcc opened this issue 8 years ago • 4 comments

Recommended after speaking with Martin Morgan.

davismcc avatar Dec 07 '16 13:12 davismcc

wow, that's big...

wikiselev avatar Dec 07 '16 13:12 wikiselev

Probably something to hold off until BioC 3.6, I think. In many respects, it's a pretty straightforward conversion, based on my knowledge of both classes; but we've already made a whole heap of interface changes, and putting this monster in the same release would annoy downstream users/developers.

LTLA avatar Dec 07 '16 17:12 LTLA

There are also a bunch of other issues that should be addressed in this conversion:

  • We should consolidate fpkmData and tpmData and cpmData into a single entry, probably called cpmData. They are effectively the same thing: normalized, unlogged expression values.
  • We need to kill .exprs_hunter. Implicit choice of expression values is opaque, functions (or people who call them) should define the desired expression type explicitly.
  • Does anyone actually need is_exprs? After QC, there is no compelling reason to keep such a large matrix around, especially when it can be quickly regenerated.

LTLA avatar May 19 '17 16:05 LTLA

On a related note, we should consider whether we should actually generate exprs in newSCESet. Many people are forgetting to call normalize after running computeSumFactors in the scran workflow, and this does not cause obvious problems downstream as the exprs (by library size) are already available. It may be better to ask users to explicitly call normalize on the constructed SCESet, with an appropriately shouty warning if size factors are not available.

LTLA avatar May 22 '17 12:05 LTLA