scRNAseq
scRNAseq copied to clipboard
symbol column dropped from rowData of SegerstolpePancreasData (devel)
In Bioc release:
> library(scRNAseq)
> sce.seger <- SegerstolpePancreasData()
> rowData(sce.seger)
DataFrame with 26179 rows and 2 columns
symbol refseq
<character> <character>
SGIP1 SGIP1 NM_032291
AZIN2 AZIN2 NM_052998+NM_001293562
CLIC4 CLIC4 NM_013943
AGBL4 AGBL4 NM_032785
NECAP2 NECAP2 NM_001145277+NM_0011..
... ... ...
KIR2DL4 KIR2DL4 NM_001080772+NM_0022..
KIR2DS3 KIR2DS3 NM_012313
KIR2DS2 KIR2DS2 NM_001291696+NM_0123..
BIVM-ERCC5 BIVM-ERCC5 NM_001204425
eGFP eGFP eGFP
In Bioc devel:
> library(scRNAseq)
> sce.seger <- SegerstolpePancreasData()
> rowData(sce.seger)
DataFrame with 26179 rows and 1 column
refseq
<character>
SGIP1 NM_032291
AZIN2 NM_052998+NM_001293562
CLIC4 NM_013943
AGBL4 NM_032785
NECAP2 NM_001145277+NM_0011..
... ...
KIR2DL4 NM_001080772+NM_0022..
KIR2DS3 NM_012313
KIR2DS2 NM_001291696+NM_0123..
BIVM-ERCC5 NM_001204425
eGFP eGFP
I think this causes OSCA.advanced and OSCA.workflows to break in devel @PeteHaitch @alanocallaghan
Hm. I think I must have deemed the row names to be redundant with the symbol column and removed the latter to reduce the file size. To avoid breaking stuff, I can dynamically add it back in for the SegerstolpePancreasData
function; however, fetchDataset()
will still return the sans-symbol
version, so people loading the dataset directly from the files (i.e., not through the per-dataset getters) will get a slightly different version of the dataset.
FYI fetchDataset()
is going to be the way forward as it (i) avoids the need for contributors to write a getter function and (ii) eliminates the involvement of dataset-specific logic that can't be easily replicated in other frameworks like Python or JS.
Is Segerstolpe the only one? FWIW you can set legacy=TRUE
and it'll pull from ExperimentHub for now.
If that's the way forward we can also adapt the corresponding parts of the OSCA book to look up the symbols from the rownames. I can't tell you whether this also happens to other datasets at this point. But the breakage comes from looking up the symbol column for ID mapping purposes, and this can be replaced by providing the rownames instead then.
Added back symbol
in 2.19.4. Only for SegerstolpePancreasData
, so fetchDataset
will still be missing symbol
.
Yeah seems sensible to just use the rownames for OSCA purposes moving forward
Think this is resolved now?