seurat
seurat copied to clipboard
Support to handle `.parquet` output from Vizgen
Dear Seurat Team, Here is the PR to handle parquet files (see #7080) Example to make Seurat objects from Cellpose output of Vizgen data.. Let me know if additional edits are needed.. Thanks A.
Thanks for the comment. Do you mean several public Vizgen datasets or in general Cellpose output data?
z = 3L
arg for z-plane is a default and was there before my PR. In order to be consistent, I added it to select a particular plane from Cellpose segmentation output.
I'm not aware that segmented cell boundaries differ through all 7 sections. If that's the case, then one should implement some additional preprocessing, but here also, which z
to choose is an open question.
..just to confirm: Vizgen is using just one z-plane, ie middle plane of the tissue for Cellpose segmentation.
The last commit should fix most of the bugs, supports old and new Vizgen data outputs, and optimized parallelization If any bugs or something I missed, let me know please.
Test run would be this
## test =================================================
## make sure all libs are installed a priori!
# load libs ----
suppressPackageStartupMessages({
library(ggplot2)
library(Seurat)
library(dplyr)
library(magrittr)
library(BiocParallel)
library(progressr)
library(spatstat)
})
# helper function to return Seurat object metadata
callmeta <- function (object = NULL) {
return([email protected])
}
# set directory to read from
dir_use <- "./merfish_seurat_debug/" # or something like "./region_0/"
##
start.time <- Sys.time()
obj <-
LoadVizgen(data.dir = dir_use,
fov = "merfish.test",
assay = "Vizgen",
metadata = c("volume", "fov"), # add cell volume info
type = c("segmentations", "centroids"), # type of cell spatial coord matrices
z = 3L,
add.zIndex = TRUE, # add z slice section to a cell
update.object = TRUE,
use.BiocParallel = TRUE,
workers.MulticoreParam = 14, # for `BiocParallel` processing
verbose = TRUE
)
end.time <- Sys.time()
message("Time taken to load object = ",
round(end.time - start.time, digits = 2), " ",
attr(c(end.time - start.time), "units"))
obj
obj %>% callmeta %>% str
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /xxx/envs/spatial_develop/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] spatstat_2.3-4 spatstat.linnet_2.3-2 spatstat.core_2.4-4
[4] rpart_4.1.16 nlme_3.1-159 spatstat.random_3.0-1
[7] spatstat.geom_3.0-3 spatstat.data_3.0-0 progressr_0.11.0
[10] BiocParallel_1.30.4 magrittr_2.0.3 dplyr_1.1.2
[13] SeuratObject_4.1.3 Seurat_4.3.0.9002 ggplot2_3.4.0
loaded via a namespace (and not attached):
[1] readxl_1.4.1 uuid_1.1-0 backports_1.4.1
[4] plyr_1.8.7 igraph_1.3.5 repr_1.1.4
[7] lazyeval_0.2.2 sp_1.5-0 splines_4.2.1
[10] listenv_0.8.0 scattermore_0.8 digest_0.6.29
[13] htmltools_0.5.3 fansi_1.0.3 tensor_1.5
[16] googlesheets4_1.0.1 cluster_2.1.4 ROCR_1.0-11
[19] tzdb_0.3.0 readr_2.1.3 globals_0.16.1
[22] modelr_0.1.9 matrixStats_0.62.0 spatstat.sparse_3.0-0
[25] colorspace_2.0-3 rvest_1.0.3 ggrepel_0.9.2
[28] haven_2.5.1 crayon_1.5.2 jsonlite_1.8.0
[31] survival_3.4-0 zoo_1.8-10 glue_1.6.2
[34] polyclip_1.10-0 gtable_0.3.1 gargle_1.2.0
[37] leiden_0.4.2 future.apply_1.9.0 abind_1.4-5
[40] scales_1.2.1 DBI_1.1.2 miniUI_0.1.1.1
[43] Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4
[46] reticulate_1.26 bit_4.0.4 htmlwidgets_1.5.4
[49] httr_1.4.4 RColorBrewer_1.1-3 ellipsis_0.3.2
[52] ica_1.0-3 pkgconfig_2.0.3 uwot_0.1.14
[55] dbplyr_2.2.1 deldir_1.0-6 utf8_1.2.2
[58] tidyselect_1.2.0 rlang_1.1.1 reshape2_1.4.4
[61] later_1.3.0 munsell_0.5.0 cellranger_1.1.0
[64] tools_4.2.1 cli_3.4.1 generics_0.1.3
[67] broom_1.0.1 ggridges_0.5.4 evaluate_0.16
[70] stringr_1.4.1 fastmap_1.1.0 goftest_1.2-3
[73] bit64_4.0.5 fs_1.5.2 fitdistrplus_1.1-8
[76] purrr_1.0.1 RANN_2.6.1 pbapply_1.7-0
[79] future_1.28.0 mime_0.12 xml2_1.3.3
[82] arrow_9.0.0 hdf5r_1.3.5 compiler_4.2.1
[85] plotly_4.10.0 png_0.1-7 spatstat.utils_3.0-1
[88] reprex_2.0.2 tibble_3.2.1 stringi_1.7.8
[91] forcats_0.5.2 lattice_0.20-45 IRdisplay_1.1
[94] Matrix_1.5-1 vctrs_0.6.2 furrr_0.3.1
[97] pillar_1.9.0 lifecycle_1.0.3 lmtest_0.9-40
[100] RcppAnnoy_0.0.19 data.table_1.14.6 cowplot_1.1.1
[103] irlba_2.3.5 httpuv_1.6.5 patchwork_1.1.2
[106] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20
[109] gridExtra_2.3 parallelly_1.32.1 codetools_0.2-18
[112] MASS_7.3-57 assertthat_0.2.1 withr_2.5.0
[115] sctransform_0.3.5 hms_1.1.2 mgcv_1.8-40
[118] parallel_4.2.1 sfarrow_0.4.1 grid_4.2.1
[121] tidyverse_1.3.2 IRkernel_1.3 tidyr_1.2.0
[124] googledrive_2.0.0 Cairo_1.6-0 Rtsne_0.16
[127] pbdZMQ_0.3-7 spatstat.explore_3.0-5 lubridate_1.8.0
[130] shiny_1.7.2 base64enc_0.1-3
This worked on all my test data and scenarios.
The only other suggestion I would have is making molecules
an optional input (since the file can be quite large, and isn't always used during analysis). Otherwise this all looks great!
This worked on all my test data and scenarios.
The only other suggestion I would have is making
molecules
an optional input (since the file can be quite large, and isn't always used during analysis). Otherwise this all looks great!
Great, thanks!
I will add support for molecules
an optional input
Any updates for this pull request?
Any updates for this pull request?
..sorry I was a bit busy, will be adding a new arg add.molecules = TRUE
in LoadVizgen()
to have molecules
FOV optionally added to the object.
Afterwards, it kind of done on my side. I could try to solve small conflicts of the branch as well.
Any further suggestions?
Thanks
Any updates for this pull request?
..sorry I was a bit busy, will be adding a new arg
add.molecules = TRUE
inLoadVizgen()
to havemolecules
FOV optionally added to the object. Afterwards, it kind of done on my side. I could try to solve small conflicts of the branch as well. Any further suggestions? Thanks
I think it looks great and looking forward to it's use
I did last updates, tested loading functions and tried to resolve some conflicts in preprocessing.R
so far looks fine on my side, let me know if any additional changes are needed.
Thanks)
I did last updates, tested loading functions and tried to resolve some conflicts in
preprocessing.R
so far looks fine on my side, let me know if any additional changes are needed. Thanks)
Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:
vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE)
vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), )
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
I got the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix
Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
How can I fix this?
I did last updates, tested loading functions and tried to resolve some conflicts in
preprocessing.R
so far looks fine on my side, let me know if any additional changes are needed. Thanks)Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:
vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE) vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), ) vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj)) vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30) vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30) vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3) vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
I got the following error:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
How can I fix this?
Hi there, I haven't done any changes to Crop()
function. It might be that your x
and y
limits are outside the range of the tissue coordinates. Also note that, x
and y
axes labels are swapped in the ImageDimPlot()
:
- #6110
- #6132
Alternatively, plot your data first with ImageDimPlot()
, and try to switch x
and y
args (ie, y=c(1250,3750),x=c(8000,9000)
) in your Crop()
run and see if it works.
I did last updates, tested loading functions and tried to resolve some conflicts in
preprocessing.R
so far looks fine on my side, let me know if any additional changes are needed. Thanks)Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:
vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE) vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), ) vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj)) vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30) vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30) vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3) vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
I got the following error:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
How can I fix this?
Hi there, I haven't done any changes to
Crop()
function. It might be that yourx
andy
limits are outside the range of the tissue coordinates. Also note that,x
andy
axes labels are swapped in theImageDimPlot()
:Alternatively, plot your data first with
ImageDimPlot()
, and try to switchx
andy
args (ie,y=c(1250,3750),x=c(8000,9000)
) in yourCrop()
run and see if it works.
Thank you for your kind suggestion. I've tried to swap the x
and y
coordinates but still got the same error. I'll open a new issue for my problem.
Hi alikhuseynov,
I find that without options(Seurat.object.assay.version = "v5")
, I will fail at the Crop
function. With options(Seurat.object.assay.version = "v5")
, I will fail at GetAssay
:
> library(Seurat)
> options(Seurat.object.assay.version = "v5")
# The following 3 packages are required to make 'LoadVizgen' happy
> library(dplyr)
> library(magrittr)
> library(spatstat)
# 'verbose' has no default value
> vizgen.obj <- LoadVizgen(data.dir='/path/to/vizgen',fov='fov',verbose=TRUE)
> vizgen.obj <- SCTransform(vizgen.obj, clip.range = c(-10, 10), assay='Vizgen')
Error in GetAssay.Seurat(object = object, assay = assay) :
Vizgen is not an assay present in the given object. Available assays are:
> GetAssay(vizgen.obj,assay='Vizgen')
Error in GetAssay.Seurat(vizgen.obj, assay = "Vizgen") :
Vizgen is not an assay present in the given object. Available assays are:
> vizgen.obj@assays$Vizgen
Assay (v5) data with 300 features for 65319 cells
First 10 features:
Ccnd2, Th, Lhx2, Cdh4, Pdgfra, Sox9, Rnd2, Ube2c, Col1a1, Npas1
Layers:
counts
Maybe this pull request should be rebased on the 'seurat5' branch?
Hi alikhuseynov, I find that without
options(Seurat.object.assay.version = "v5")
, I will fail at theCrop
function. Withoptions(Seurat.object.assay.version = "v5")
, I will fail atGetAssay
:> library(Seurat) > options(Seurat.object.assay.version = "v5") # The following 3 packages are required to make 'LoadVizgen' happy > library(dplyr) > library(magrittr) > library(spatstat) # 'verbose' has no default value > vizgen.obj <- LoadVizgen(data.dir='/path/to/vizgen',fov='fov',verbose=TRUE) > vizgen.obj <- SCTransform(vizgen.obj, clip.range = c(-10, 10), assay='Vizgen') Error in GetAssay.Seurat(object = object, assay = assay) : Vizgen is not an assay present in the given object. Available assays are: > GetAssay(vizgen.obj,assay='Vizgen') Error in GetAssay.Seurat(vizgen.obj, assay = "Vizgen") : Vizgen is not an assay present in the given object. Available assays are: > vizgen.obj@assays$Vizgen Assay (v5) data with 300 features for 65319 cells First 10 features: Ccnd2, Th, Lhx2, Cdh4, Pdgfra, Sox9, Rnd2, Ube2c, Col1a1, Npas1 Layers: counts
Maybe this pull request should be rebased on the 'seurat5' branch?
All PRs go to the "develop"
branch of Seurat.
Seurat "v5"
is bit different with new/updated features & functions, I could add this support to my forked branch "seurat5"
as well soon.
If you are using this PR patch to load Vizgen data, make sure to use SeuratObject
version 4.1.3
(can be installed form CRAN)
Thanks @lambdamoses, I got inspired by SpatialFeatureExperiment
where sf
package is used to handle and filter segmentation geometries.
Updates for this PR:
- adding support to efficiently filter segmentation polygons (removing small artifacts etc..)
- added helper function
.filter_polygons()
(a bit modified version fromSpatialFeatureExperiment:::.filter_polygons
) for filtering givenmin.area
- minimal polygon area as a threshold, if set toNULL
the max. area (ie, the largest polygon) will be used.
- added helper function
- allow for multiple segmentation polygons per cell (..still using single z-plane for now)
How to use example -> make Seurat objects (essentially the same with a small update)
@dfhannum if you would like re-run it on some of your Vizgen data as before, to ensure the robustness, would be awesome. Thanks!
Hi Seurat Team,
I am with Vizgen. I just tested alikhuseynov:feat/vizgen
with Vizgen data. This update works well. please see this file: https://github.com/swangvzg/seurat/blob/master/notebooks/Seurat_PR.html
@AustinHartman @dfhannum @alikhuseynov @tangboyun
Thanks, Shichen
Hi Seurat Team,
I am with Vizgen. I just tested
alikhuseynov:feat/vizgen
with Vizgen data. This update works well. please see this file: https://github.com/swangvzg/seurat/blob/master/notebooks/Seurat_PR.html@AustinHartman @dfhannum @alikhuseynov @tangboyun
Thanks, Shichen
Thanks @swangvzg! I'm glad it works.. It is still on Seurat "4.3.0.9002"
and it works with SeuratObject "5.0.1"
I would say it's up to develop team to integrate this PR into Seurat v5
, I can make any additional changes that are needed/requested.
@swangvzg & @dfhannum, btw, we are developing Bioc. SpatialFeatureExperiment
(SFE) and a converter between it and Seurat (under development). There is readVizgen
in PR, if you and other Vizgen members are interested in testing it as well.
I can confirm that it work with Seurat
v5
as well (R version 4.3.1), tested on alikhuseynov:vizgen_seurat5
(which is essentially the most recent develop
branch with PR #8298)