SpatialExperiment icon indicating copy to clipboard operation
SpatialExperiment copied to clipboard

move `read10xVisium` to `TENxIO`

Open HelenaLC opened this issue 1 year ago • 19 comments

relates to #114 #116 #119 #123

HelenaLC avatar Nov 17 '22 13:11 HelenaLC

Hi @HelenaLC @drighelli , I'm following up here regarding our discussions last week about moving read10xVisium() and the new molecule-based platform loading functions into a new package.

I think my view is this is a good idea, since it will help streamline long-term maintenance and avoids making the SpatialExperiment package too cluttered.

Do you have any further views?

If we proceed with this I would be happy to volunteer to set up the new package and split things out. We could call it SpatialIO or SpatialUtils, or similar.

lmweber avatar Oct 31 '23 17:10 lmweber

Hi @lmweber,

thanks for following up on this, I totally agree with moving these functions to a dedicated package.

I think that @LiNk-NY could also be interested in following this thread, because he already implemented a version of read10xVisium into the TENxIO package.

I also tested the readCosMx function from the https://github.com/drighelli/SpatialExperiment/pull/144 and it works well, but if I remember correctly, it misses the image loading into it. Which is a very useful improvement for this.

For the name of the package, I'd be more inclined to the SpatialIO, because it will be only dedicated to the loading of the data from disk. We can then imagine a set of functions to export such data into a SpatialExperiment or any other class. Otherwise, if we want to stick with the SpatialExperiment class, another idea could be to name the package SpatialExperimentIO.

drighelli avatar Nov 02 '23 09:11 drighelli

Hi @drighelli ,

All of my functions in #144 - readXenium, readCosMx, and readMerscope - do not include image loading.

Because these technologies do not really support automated H&E generation and is very different from Visium:

  • Xenium: even if Xenium has the ability of post staining with H&E, the staining is not always done. From my knowledge, only one dataset performed such post staining in the technology preprint. In addition, the post Xenium H&E image and Xenium spatial coordinates needs to be aligned before plotting onto the same scale, using some aligning tools such as STalign.
  • CosMx: There are some fluorescent image that are chopped into "field of views", and has to be stitched together to have the whole view of the tissue. Therefore, I don't think plotting spatial coordinates overlay small patches of fluorescent images are very necessary for downstream analysis.
  • Merscope: They have some 20GB images available to download on cell boundary and DAPI, and it would be very bulky if the main purpose is simply to analyze the spatial coordinates.

In addition, if you look at Seurat, they also did not include loading images for these technologies in their reader functions, simply because the images are not available.

For the naming of the new package, I think SpatialExperimentReader sounds straightforward 😆 . I am happy with your suggestions on SpatialExperimentIO too 😄 .

Sincerely, Estella

estellad avatar Nov 03 '23 06:11 estellad

My two cents: I think it's very valuable to have a dedicated IO package specific to spatial data and each different technology (current and future). I would not bind it to SpatialExperiment, as this could be a place to include all IO functions that return a SpatialData, MoleculeExperiment, SpatialFeatureExperiment, etc depending on the needs of the user.

Hence, I think SpatialReader or SpatialIO would be good names, but I would not call it SpatialExperimentIO unless you want it to be specific to SpatialExperiment.

With regards to the images, I think that having a low-res image in the object for plotting purposes could be valuable, but perhaps not vital. And if one needs to be working with images, probably SpatialData is the way to go?

OTOH having access to the cell boundaries and/or to the transcript-level data could be valuable and achieved easily by simply stitching a terra shapefile / geoparquet object in the colData or metadata, similar to what @lgeistlinger has done with the MerfishData package. Could this be an interim solution until we have a release of SpatialData?

drisso avatar Nov 03 '23 08:11 drisso

Following @drisso 's discussion on access to the cell boundaries and transcript-level data like in the MerfishData package, another idea from Seurat is they used package [sp](https://cran.r-project.org/web/packages/sp/index.html) to store boundaries as sp::Polygons object and molecules as sp::SpatialPoints object.

These additional information would make the SpatialExperiment object very bulky to work with, if we are only interested in cell-level data. However, I would say it is still necessary to implement them, as it might be interesting for the future?

Agree that if images are involved, it's better to go with SpatialData.

estellad avatar Nov 03 '23 13:11 estellad

But isn't this what SpatialFeatureExperiment is doing?

drisso avatar Nov 03 '23 14:11 drisso

Seurat, MerfishData, and SpatialFeatureExperiment are all trying to include more information than cell-level SPE, but the boundaries and molecules are stored as different class and at different places in these three.

Seurat::LoadXXX() (stores molecules and boundaries with sp::Polygons and sp::SpatialPoints in their image slot), MerfishData (stores molecules as SplitDataFrameList objects, store polygons in metadata as a long format dataset rather than in list), and SpatialFeatureExperiment (stores boundaries as S3: sfc and sfg POLYGON classes, does not store molecules).

Please correct me if I am wrong. Thank you!

I think Seurat and MerfishData's way are the most straightforward.

estellad avatar Nov 03 '23 15:11 estellad

Just to add my two cents, the SpatialIO package should aim to conditionally load dependent packages based on the desired class, e.g,. as Davide mentioned, SpatialData, MoleculeExperiment, SpatialFeatureExperiment, etc. This means adding those packages to the Suggests. The IO package should aim to do minimal read operations. Otherwise, the package would be bloated with all the possible compatible technologies. I considered moving TENxIO::TENxVisium function into it's own package due to the size of the codebase. For now, it is in the devel version of the package. The other option is to create individual packages per class, SpatialExperimentIO in the case that the codebase becomes significantly large especially if there are pre-processing steps done on the data before class generation.

LiNk-NY avatar Nov 03 '23 17:11 LiNk-NY

Hi all, I had a discussion with @estellad today and we would like to propose the following plan to move this forward.

(i) @estellad has set up a new package called SpatialExperimentReader, which contains functions for reading in data from Xenium, MERSCOPE, and CosMx platforms as SpatialExperiment objects. This package will be specific to SpatialExperiment and these platforms (and will not include read10xVisium()) to keep maintenance manageable. We can then call these functions in other packages using Suggests: to avoid duplicating code.

(ii) We will also set up a second package called SpatialIO, which will use Suggests: to provide access to the reader functions from SpatialExperimentReader, as well as read10xVisium() (from either SpatialExperiment or TENxIO), and can also be extended to provide access to reader functions from MoleculeExperiment and/or SpatialFeatureExperiment. The reason for having this package is because we now have several package names (SpatialExperiment, TENxIO, SpatialExperimentReader, ME, SPFE) so it gets confusing to remember which reader function is where, so it is useful to have a single package name to use in vignettes and examples. We will use Suggests: to avoid duplicating code.

Any feedback on this plan? If this sounds good, then @estellad will aim to submit the SpatialExperimentReader package to Bioc soon, and once this is in Bioc-devel I will follow it up with SpatialIO.

A couple of specific questions also:

  • Any thoughts on the name SpatialExperimentReader? This package contains functions for Xenium, MERSCOPE, and CosMx, so maybe something with "molecule-based" would also work and be more specific.

  • @LiNk-NY is the version of read10xVisium() currently in TENxIO the same as the one in SpatialExperiment? If so we could remove it from SpatialExperiment now. Alternatively would you prefer we just split it out as a separate small package (e.g. VisiumIO) and put this in Suggests: in SpatialExperiment and TENxIO (as you mentioned above)?

lmweber avatar Nov 13 '23 19:11 lmweber

Any thoughts on the name SpatialExperimentReader?

Why not SpatialExperimentIO to keep it consistent with some of the other package names (SpatialIO, SpatialDataIO, TENxIO)?

and will not include read10xVisium()

Why not? Doesn't the function read visium data into a SpatialExperiment and would thus fit very well into the SpatialExperimentIO package which has the scope of reading data from different technologies into SpatialExperiments?

another idea from Seurat is they used package sp to store boundaries as sp::Polygons object and molecules as sp::SpatialPoints object.

I can only wonder why Seurat would use sp which is in the process of being replaced by sf (used by SpatialFeatureExperiment), and terra (used by Giotto) is the most performant option among these three. If you consider adding point and polygon representations, I'd recommend using terra::vect(..., type = "points") and terra::vect(..., type = "polygons").

MerfishData stores polygons in metadata

Yes but only for rare cases where we have no 1:1 correspondence between cells and polygons. If there is a 1:1 correspondence, which should pretty much always be the case, it will be better to have this in the colData or spatialCoords to do integrated subsetting of cells and cell metadata that also subsets the polygon boundaries by the selected cells.

These additional information would make the SpatialExperiment object very bulky to work with

You could make reading the images and/or polygons optional via logical arguments to the function to provide for the different use cases (see eg the use.images and use.polygons arguments to MerfishData::MouseIleumPetukhov2021)

lgeistlinger avatar Nov 13 '23 20:11 lgeistlinger

and will not include read10xVisium()

Why not? Doesn't the function read visium data into a SpatialExperiment and would thus fit very well into the SpatialExperimentIO package which has the scope of reading data from different technologies into SpatialExperiments?

This is mainly for ease of maintenance. @estellad has independently developed the reader functions for Xenium, MERSCOPE, and CosMx, and would like to contribute these to Bioconductor, so it would be great for her to keep ownership over these within her own package and take care of the long-term maintenance, without also adding additional maintenance burden by including read10xVisium() within the same package. Hence our proposed solution of SpatialIO as a wrapper package providing access to all of these together via Suggests: to make everything easily accessible for users.

lmweber avatar Nov 13 '23 20:11 lmweber

Hi @lgeistlinger,

Yes, I am open to rename the package to SpatialExperimentIO. Another small thing I would add is also to directly return SingleCellExperiment class and store the spatial coordinates in colData. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.

Thank you for your suggestions on storing polygon with terra and in colData or spatialCoords. And yes, we can use logical arguments to decide what level of details to load. I think for the first version of SpatialExperimentIO, I will just keep it simple, loading only cell-level counts without molecules or polygons. This simplification will speed up other dependencies. Gradually, I will look more into MerfishData's implementation to make SPE more complete.

Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons)

And hopefully, we can work on these together? :)

estellad avatar Nov 13 '23 21:11 estellad

Thanks @estellad and @lmweber !

I think this is a good plan. I second @lgeistlinger 's suggestion for the package name and for storing (optionally) polygons as terra objects in the colData.

This is a bit off-topic here, but we're also using the Xenium preprint breast cancer data for benchmarking purposes and I thought that having an ExperimentHub package with those data would simplify data reuse (probably one package with four separate objects). This could be eventually updated to use SpatialData, but we could start simple as you mentioned.

Definitely happy to could work together on such a data package if you're interested.

drisso avatar Nov 14 '23 08:11 drisso

Thanks @lmweber and @estellad,

I also agree with the plan, I also share @lgeistlinger thoughts on the name of the package and on terra's implementation.

another small thing I would add is also to directly return SingleCellExperiment class and store the spatial coordinates in colData. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.

About this, I don't agree with that, I'd be more inclined to modify the SpatialExperiment by storing the spatialCoords inside the colData and retrieving them with the spatialCoords accessor. Also because several technologies have more than one reference as spatial coordinates (global/local), it would be nice to properly handle this.

Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons) And hopefully, we can work on these together? :)

As @drisso mentioned, we were doing the same, so I think it would be nice to work together to provide such a package. But it would be better to move the discussion somewhere else. Maybe, we can get in touch on the Bioc slack and discuss it further.

Dario

drighelli avatar Nov 14 '23 09:11 drighelli

Thanks @drisso and @drighelli . I have messaged you on Bioconductor slack for the Xenium preprint ExperimentHub package. We had a call with 10x Genomics yesterday, and they agreed with using the Zenodo link (I shared with you in the Slack channel) as an alternative link to their company website.

@drighelli, I agree with you about "flattening" SPE to a structure similar to SCE, by having spatialCoords() inside colData. How long do you think you can finish the implementation of the change? If such a change is made, some other packages like ggspavis and the new SpatialExperimentIO will also need to be altered.

So meanwhile waiting for the update, I would proceed with having the user choose either to return SPE or SCE in SpatialExperimentIO and putting it on Bioc first. Also for ggspavis, I have made some major updates to be reviewed by @lmweber . I will make ggspavis take both SCE and SPE for now. These "SPE-or-SCE" choice statements can be easily modified after your structural update on SpatialExperiment. What do you think?

Estella

estellad avatar Nov 15 '23 11:11 estellad

Update: I've moved the import functionality to VisiumIO and will submit to Bioconductor soon. Feel free to test and see the examples in the README.md.

LiNk-NY avatar Nov 22 '23 18:11 LiNk-NY

Update: VisiumIO is currently in Bioconductor devel : https://bioconductor.org/packages/VisiumIO

LiNk-NY avatar Jan 30 '24 20:01 LiNk-NY

Thanks Marcel and sorry for the late response!

drighelli avatar Mar 18 '24 17:03 drighelli

Thank you!

lmweber avatar Mar 18 '24 18:03 lmweber