SpatialExperiment
SpatialExperiment copied to clipboard
Protect shuffling of rows in spatialCoords
Currently it is possible to shuffle rows in spatialCoords, so that these rows become out of sync with the rows in colData.
We could think about protecting against this in some way, either using rownames of spatialCoords, or some other way.
It would be best to have similar behavior to reducedDims, which returns a warning (error in next version of Bioconductor):

Issues like this were my fear when putting spatialCoords into a separate slot (not in reducedDim)... I wanted to quickly fix this, writing some unit tests on the side to mimick SCE-reducedDim behavior, and immediately ran into unexpected issues. Trying to consider all scenarios for the analogous case of reducedDims in SCE, we have to cover more than just the issue pointed out above. From what I can tell, the behaviour is (as expected):
- object: no colnames, value: rownames -> error
- object: colnames, value: no rownames -> pass
- object: colnames, value: different rownames -> error
- object: colnames, value: same rownames -> pass
But, there are other things happening:
- colnames(object) <- NULL also sets rownames(reducedDims) to NULL
- if object has colnames, rownames(reducedDims) <- NULL gets ignored
- reducedDim<- (like other replacement methods) has an argument
withDimnames = TRUE/FALSEthat let's you ignore the incoming dimnames - generally, access/replacement is controlled by
withDimnames
and kept in synch between the object &reducedDim - ...perhaps something else I missed...
(see here: https://github.com/drisso/SingleCellExperiment/blob/master/R/reducedDims.R)
All that is to say: It's not as simple as adding...
old <- colnames(x)
new <- rownames(value)
if (!is.null(new) && !identical(old, new)) {
stop("non-NULL 'rownames(value)' should be the same as",
" 'colnames(x) for 'spatialCoords(x) <- value'.")
}
...and we'd have all this by default if spatialCoords were a reducedDim with a custom accessor :/
Yes, I can see this now, especially with this example, and now that we have only spatialCoords (and spatialData which was more complicated / not a numeric matrix is gone).
I wonder if it is worth considering (yet another) internal restructure...
I still think that the spatialCoords are not needed to be seen as a reducedDim data.
The reducedDims mostly born for data transformations as PCA, t-SNE and UMAP, which reduce the dimensions of the original data.
I think this issue could be easily solved with the propagation on the colData and on the assays of the operations made on the spatialCoords.
In the same manner as we made for the subset.
As I tried to express earlier, I don't see a real difference. Sure, physical xy-coordinates are not a reduced dimension, but they are conceptually the same. I.e., mapping cells to an xy-plane for plotting and analysis.
The key advantages I see with having spatialCoords as a reducedDim are
- we don't need to do any extra work regarding the above, i.e. long-term robuster and more easily maintainable VS having to almost copy-paste code from SCE to make
spatialCoordsbehave properly... - access to plotting functions that already exist in
scuttle,scater(and possibly lots of other packages) that work a laplotReducedDimorggcellsVS having tocbindstuff all the time or writing custom wrappers
But anyways, I have said all this before and feel like I'm going in circles. We already had discussed this, so... not sure what else to add. And, of course (!), I also have my doubts and am not sure what I'm saying is "the best idea":
At times like this I think we would really benefit from getting opinions from both the community (usability) and developers (smart design); perhaps via a poll (vote + comments / arguments for and against) or similar. Three opinions where we base design choices on >= 2 isn't so good I think, and there's always a high chance that we'll make one decision and then flip to another at some point (as was the case with spatialData...) 🤔
Hi all, we discussed this again today, and wanted to check if anyone had any more feedback or ideas before we proceed on this one.
We are thinking it would be simpler and more robust if we store spatialCoords in reducedDims instead of the current structure, since this (i) lets us re-use all the existing validity checks for reducedDims and simplifies long-term maintenance, and (ii) makes it easier to re-use existing downstream functions that access reduceDims (e.g. for plotting).
@LTLA @drisso @lgeistlinger do any of you have any thoughts on this?
One point is that it could be confusing for users if the show method shows spatialCoords alongside PCA and UMAP etc, which could be addressed by slightly customizing the show method and still showing the spatialCoords on a separate line.
cc @HelenaLC @drighelli from today's discussion. Thank you!
My 2 cents.
Storing spatial co-ordinates in reducedDims feels wrong to me for the reasons raised by @drighelli (https://github.com/drighelli/SpatialExperiment/issues/103#issuecomment-1064871759) and the need to patch the show method exemplifies that spatial co-ordinates are not the same as what is stored in reducedDims.
If the spatialCoords and reducedDims have similar requirements in terms of validation and getters/setters, the better path would seem to me to be having general validation and getter/setters methods that work on both.
As for plotting, would adding a scater::plotSpatialCoord() or scater::plotSpatialCoords() (not sure about plural) that parallels scater::plotReducedDim() be what's needed?
It seems better to me to then have a more general scater::plotRDSC() (perhaps not exported?) that specialises to scater::plotReducedDim() and scater::plotSpatialCoord(), much in the way that scater::plotUMAP() and scater::plotTSNE() are specialisations of scater::plotReducedDim().
Adding a use_spatialcoord argument to scuttle::makePerCellDF()/scuttle::ggcells() might be considered, a la the existing use_dimred and use_altexps.
@Alanocallaghan probably has a better view of this.
It's worth saying ahead of time that I've played with spatial data for all of about 30 minutes, so my opinion shouldn't hold much weight.
If I didn't care at all about backwards compatibility and the like, I'd probably go with Helena/Lukas's idea of re-using the reducedDim code, but change the name to something more generic (altDim a la altExp perhaps) that reducedDim and spatialCoords could both use. However that'd be a horrendous rewrite of multiple packages that'd likely break tonnes of code, so seems off limits.
The idea of duplicating the code is obviously horrible, given you've now duplicated the chance of bugs, and sharing it between SCE and SpE (via a shared dependency?) seems a bit OTT.
Therefore I think the simplest is what Lukas' last comment suggests; use reducedDim but obfuscate it from users slightly. It's easier to modify the show method and add a couple of utility functions that used reducedDim stuff on the backend than reinventing the wheel for an extremely similar use case. Maybe it's kicking the can down the road and some horrible issues will pop up, but it's a low-effort bodge for now so perhaps it'll be easier to see a path forward when it comes time to fix things.
That's my 2c anyway, I was going to tag in Aaron and Davide as people who are better placed than I to contribute to the discussion but I can see I've been beaten to it.
I see the structural / implementation argument, but the semantics seem off.
What I like about the idea of having a construct such as reducedDims also for representing spatial coordinates that we would no longer be restricted to 1) just providing a single set of spatial coordinates for each cell, and 2) representing them as a matrix only.
If we would use something like reducedDims, we would have a list of matrix-like objects (as we have for assays in a good old SE btw) and could
-
easily store alternative sets of spatial coordinates for each cell (eg one for storing the cell centroids for each cell, one for storing polygon segmentation boundaries for each cell), and
-
each element of the list could also eg be a
DataFrame, which would be great for eg storing S4 objects representing cell polygons, or something HDF5-backed there for datasets with many cells or extremely fine-granular polygon boundaries (imagine a segmentation where each cell ends up being represented by polygons with hundreds to thousands of vertices).
Strongly advise against chucking the spatialCoords in the reducedDims, because the former is not the latter. It's not just a question of data structure, it's a question of semantics. Downstream applications trust that the reducedDims contain, well, reduced dimensions, and proceed accordingly for visualization and querying. For example, I have databases where users might ask, "hey, which datasets have pre-computed reduced dimensions from their publication?", and throwing spatial coordinates into the reducedDims slot would contaminate the answer to that question.
I don't really see the problem with the current setup, provided that you make sure that the spatialCoords are subsetted by row properly in the [ and [<- methods. If you really don't want to do it yourself, I suppose you could store it as a nested column in the int_colData with an appropriately unique name, which gives you the subsetting for free. You'd still have to write the getters and setters yourself, though. I don't see the problem with duplicating the name-checking code - where do you think I got the code to do it from reducedDims? I just copied it from SummarizedExperiment::assay.
Besides, at some point, you're going to have to apply your own spatial-specific checks to the setters, e.g., check that the column names have x, y, etc. or whatever the corresponding thing is for polygons (suggest a SplitDataFrameList). Better to have the flexibility of modifying your own code base than to be restricted by SCE's internals.
Alrighty, thanks for all your feedback, seems like a pretty unanimous vote. I'll get to addressing the original issue(s) for now, then we can get to the polygons-and-stuff business.