SpatialExperiment icon indicating copy to clipboard operation
SpatialExperiment copied to clipboard

cbind when subsetting spots from samples

Open lmweber opened this issue 2 years ago • 4 comments

I was working with some SPE objects and noticed cbind has some unexpected behavior when subsetting spots from samples.

Specifically, I have started with a SPE containing multiple samples and with 2 distinct types of tissue per sample, so I split the samples into two sub-objects containing tissue type 1 and tissue type 2 respectively. Then I perform quality control (QC) separately on each sub-object / tissue type and store various QC metrics in the sub-objects. However then cbind does not let me easily put the objects back together due to checks on sample_id.

First, I get the warning:

> spe <- cbind(spe_LC, spe_WM)
'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.

and then an error if I try to remove the appended sample indices:

> colData(spe)$sample_id <- gsub("\\.[0-9]$", "", colData(spe)$sample_id)
Error in .local(x, ..., value) : 
  Number of unique 'sample_id's is 18, but 9 were provided.

So I think some of our checks are too strict here. I'll leave this note here for now so we don't forget about it and we can discuss at some point. It is possible to work around this issue by not working with sub-objects, but I think the error above has the potential to create some issues for some users.

lmweber avatar Mar 24 '22 21:03 lmweber

Thanks @lmweber for posting this. I came here to raise the same issue. I find this behaviour a bit dangerous as it generates sample_id entries that do not match the initial assignment and will therefore break any relation between the SpatialExperiment object and e.g. external metadata. For inexperienced users it can be potentially confusing.

nilseling avatar May 16 '22 08:05 nilseling

Thanks @nilseling and @lmweber,

Maybe, we should reconsider our way of thinking on the sample_id(s).

I think that SingleCellExperiment leaves the choice of the sample names to the user...

drighelli avatar May 16 '22 10:05 drighelli

Yeah, the SingleCellExperiment container has no hard-coded entry for the sample ids. But I guess you need them for linking cells/spots to images. My recommendation is to handle the sample_id entry "as is" in the colData slot and to check that there are no duplicates in the imgData entry

nilseling avatar May 16 '22 11:05 nilseling

Thanks @lmweber, I have exactly the same issue and the new sample_id entries generated by cbind also prevent using downstream functions:

aggregateAcrossCells(spe,  ids = spe$patient_id, statistics = "mean")
Error in .local(x, ..., value) : 
  Number of unique 'sample_id's is 47, but 1 was provided.

Is there a "better" workaround than this? (i.e. that would allow to do the same thing without reverting to SingleCellExperiment)

sce <- as(spe, "SingleCellExperiment")
sce$sample_id <- sce$roi_id
aggregateAcrossCells(sce,  ids = sce$patient_id, statistics = "mean")

class: SingleCellExperiment 
dim: 44 47 
metadata(4): spillcomp stages cases subset
assays(1): counts
rownames(44): GHRL HBP ... GCG PPY
rowData names(10): index channel ... clustering shortName
colnames(47): 6044 6061 ... 8005 8008
colData names(32): CaseID Panel ... ids ncells
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

ndamond avatar May 23 '22 09:05 ndamond