seurat icon indicating copy to clipboard operation
seurat copied to clipboard

Support to handle `.parquet` output from Vizgen

Open alikhuseynov opened this issue 1 year ago • 18 comments

Dear Seurat Team, Here is the PR to handle parquet files (see #7080) Example to make Seurat objects from Cellpose output of Vizgen data.. Let me know if additional edits are needed.. Thanks A.

alikhuseynov avatar Apr 19 '23 18:04 alikhuseynov

Thanks for the comment. Do you mean several public Vizgen datasets or in general Cellpose output data?

z = 3L arg for z-plane is a default and was there before my PR. In order to be consistent, I added it to select a particular plane from Cellpose segmentation output. I'm not aware that segmented cell boundaries differ through all 7 sections. If that's the case, then one should implement some additional preprocessing, but here also, which z to choose is an open question.

alikhuseynov avatar May 12 '23 20:05 alikhuseynov

..just to confirm: Vizgen is using just one z-plane, ie middle plane of the tissue for Cellpose segmentation.

alikhuseynov avatar May 24 '23 09:05 alikhuseynov

The last commit should fix most of the bugs, supports old and new Vizgen data outputs, and optimized parallelization If any bugs or something I missed, let me know please.

Test run would be this

## test =================================================
## make sure all libs are installed a priori!
# load libs ----
suppressPackageStartupMessages({
  library(ggplot2)
  library(Seurat)
  library(dplyr)
  library(magrittr)
  library(BiocParallel)
  library(progressr)
  library(spatstat)  
})

# helper function to return Seurat object metadata
callmeta <- function (object = NULL) { 
  return([email protected]) 
  }

# set directory to read from
dir_use <- "./merfish_seurat_debug/" # or something like "./region_0/"

##
start.time <- Sys.time()
obj <-
  LoadVizgen(data.dir = dir_use,  
             fov = "merfish.test", 
             assay = "Vizgen",
             metadata = c("volume", "fov"), # add cell volume info
             type = c("segmentations", "centroids"), # type of cell spatial coord matrices
             z = 3L,
             add.zIndex = TRUE, # add z slice section to a cell
             update.object = TRUE,
             use.BiocParallel = TRUE,
             workers.MulticoreParam = 14, # for `BiocParallel` processing
             verbose = TRUE
             )
end.time <- Sys.time()
message("Time taken to load object = ", 
        round(end.time - start.time, digits = 2), " ", 
        attr(c(end.time - start.time), "units"))
obj
obj %>% callmeta %>% str
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /xxx/envs/spatial_develop/lib/libopenblasp-r0.3.21.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
[9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] spatstat_2.3-4        spatstat.linnet_2.3-2 spatstat.core_2.4-4  
[4] rpart_4.1.16          nlme_3.1-159          spatstat.random_3.0-1
[7] spatstat.geom_3.0-3   spatstat.data_3.0-0   progressr_0.11.0     
[10] BiocParallel_1.30.4   magrittr_2.0.3        dplyr_1.1.2          
[13] SeuratObject_4.1.3    Seurat_4.3.0.9002     ggplot2_3.4.0        

loaded via a namespace (and not attached):
[1] readxl_1.4.1           uuid_1.1-0             backports_1.4.1       
[4] plyr_1.8.7             igraph_1.3.5           repr_1.1.4            
[7] lazyeval_0.2.2         sp_1.5-0               splines_4.2.1         
[10] listenv_0.8.0          scattermore_0.8        digest_0.6.29         
[13] htmltools_0.5.3        fansi_1.0.3            tensor_1.5            
[16] googlesheets4_1.0.1    cluster_2.1.4          ROCR_1.0-11           
[19] tzdb_0.3.0             readr_2.1.3            globals_0.16.1        
[22] modelr_0.1.9           matrixStats_0.62.0     spatstat.sparse_3.0-0 
[25] colorspace_2.0-3       rvest_1.0.3            ggrepel_0.9.2         
[28] haven_2.5.1            crayon_1.5.2           jsonlite_1.8.0        
[31] survival_3.4-0         zoo_1.8-10             glue_1.6.2            
[34] polyclip_1.10-0        gtable_0.3.1           gargle_1.2.0          
[37] leiden_0.4.2           future.apply_1.9.0     abind_1.4-5           
[40] scales_1.2.1           DBI_1.1.2              miniUI_0.1.1.1        
[43] Rcpp_1.0.9             viridisLite_0.4.1      xtable_1.8-4          
[46] reticulate_1.26        bit_4.0.4              htmlwidgets_1.5.4     
[49] httr_1.4.4             RColorBrewer_1.1-3     ellipsis_0.3.2        
[52] ica_1.0-3              pkgconfig_2.0.3        uwot_0.1.14           
[55] dbplyr_2.2.1           deldir_1.0-6           utf8_1.2.2            
[58] tidyselect_1.2.0       rlang_1.1.1            reshape2_1.4.4        
[61] later_1.3.0            munsell_0.5.0          cellranger_1.1.0      
[64] tools_4.2.1            cli_3.4.1              generics_0.1.3        
[67] broom_1.0.1            ggridges_0.5.4         evaluate_0.16         
[70] stringr_1.4.1          fastmap_1.1.0          goftest_1.2-3         
[73] bit64_4.0.5            fs_1.5.2               fitdistrplus_1.1-8    
[76] purrr_1.0.1            RANN_2.6.1             pbapply_1.7-0         
[79] future_1.28.0          mime_0.12              xml2_1.3.3            
[82] arrow_9.0.0            hdf5r_1.3.5            compiler_4.2.1        
[85] plotly_4.10.0          png_0.1-7              spatstat.utils_3.0-1  
[88] reprex_2.0.2           tibble_3.2.1           stringi_1.7.8         
[91] forcats_0.5.2          lattice_0.20-45        IRdisplay_1.1         
[94] Matrix_1.5-1           vctrs_0.6.2            furrr_0.3.1           
[97] pillar_1.9.0           lifecycle_1.0.3        lmtest_0.9-40         
[100] RcppAnnoy_0.0.19       data.table_1.14.6      cowplot_1.1.1         
[103] irlba_2.3.5            httpuv_1.6.5           patchwork_1.1.2       
[106] R6_2.5.1               promises_1.2.0.1       KernSmooth_2.23-20    
[109] gridExtra_2.3          parallelly_1.32.1      codetools_0.2-18      
[112] MASS_7.3-57            assertthat_0.2.1       withr_2.5.0           
[115] sctransform_0.3.5      hms_1.1.2              mgcv_1.8-40           
[118] parallel_4.2.1         sfarrow_0.4.1          grid_4.2.1            
[121] tidyverse_1.3.2        IRkernel_1.3           tidyr_1.2.0           
[124] googledrive_2.0.0      Cairo_1.6-0            Rtsne_0.16            
[127] pbdZMQ_0.3-7           spatstat.explore_3.0-5 lubridate_1.8.0       
[130] shiny_1.7.2            base64enc_0.1-3 

alikhuseynov avatar Jun 06 '23 10:06 alikhuseynov

This worked on all my test data and scenarios.

The only other suggestion I would have is making molecules an optional input (since the file can be quite large, and isn't always used during analysis). Otherwise this all looks great!

dfhannum avatar Jun 06 '23 21:06 dfhannum

This worked on all my test data and scenarios.

The only other suggestion I would have is making molecules an optional input (since the file can be quite large, and isn't always used during analysis). Otherwise this all looks great!

Great, thanks! I will add support for molecules an optional input

alikhuseynov avatar Jun 07 '23 07:06 alikhuseynov

Any updates for this pull request?

dfhannum avatar Jul 19 '23 13:07 dfhannum

Any updates for this pull request?

..sorry I was a bit busy, will be adding a new arg add.molecules = TRUE in LoadVizgen() to have molecules FOV optionally added to the object. Afterwards, it kind of done on my side. I could try to solve small conflicts of the branch as well. Any further suggestions? Thanks

alikhuseynov avatar Jul 19 '23 14:07 alikhuseynov

Any updates for this pull request?

..sorry I was a bit busy, will be adding a new arg add.molecules = TRUE in LoadVizgen() to have molecules FOV optionally added to the object. Afterwards, it kind of done on my side. I could try to solve small conflicts of the branch as well. Any further suggestions? Thanks

I think it looks great and looking forward to it's use

dfhannum avatar Jul 20 '23 20:07 dfhannum

I did last updates, tested loading functions and tried to resolve some conflicts in preprocessing.R so far looks fine on my side, let me know if any additional changes are needed. Thanks)

alikhuseynov avatar Jul 26 '23 09:07 alikhuseynov

I did last updates, tested loading functions and tried to resolve some conflicts in preprocessing.R so far looks fine on my side, let me know if any additional changes are needed. Thanks)

Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:

vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE)
vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), )
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
 I got the following error:
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix
Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
 How can I fix this?

tangboyun avatar Aug 02 '23 09:08 tangboyun

I did last updates, tested loading functions and tried to resolve some conflicts in preprocessing.R so far looks fine on my side, let me know if any additional changes are needed. Thanks)

Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:

vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE)
vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), )
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
 I got the following error:
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix
Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
 How can I fix this?

Hi there, I haven't done any changes to Crop() function. It might be that your x and y limits are outside the range of the tissue coordinates. Also note that, x and y axes labels are swapped in the ImageDimPlot():

  • #6110
  • #6132

Alternatively, plot your data first with ImageDimPlot(), and try to switch x and y args (ie, y=c(1250,3750),x=c(8000,9000)) in your Crop() run and see if it works.

alikhuseynov avatar Aug 02 '23 12:08 alikhuseynov

I did last updates, tested loading functions and tried to resolve some conflicts in preprocessing.R so far looks fine on my side, let me know if any additional changes are needed. Thanks)

Dear alikhuseynov, I've tried your latest patch on the Vizgen '.parquet' data. When using the 'Crop' function like this:

vizgen.obj <- LoadVizgen(data.dir = '/path/to/input_dir', fov = 'fov',verbose=TRUE)
vizgen.obj <- SCTransform(vizgen.obj, assay = 'Vizgen', clip.range = c(-10, 10), )
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = 'pca', dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
vizgen.crop <- Crop(vizgen.obj[['fov']],x=c(1250,3750),y=c(8000,9000),coords='plot')
 I got the following error:
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'y' in selecting a method for function 'Overlay': cannot derive coordinates from non-numeric matrix
Calls: Crop ... stopifnot -> Polygon -> coordinates -> coordinates -> .local
 How can I fix this?

Hi there, I haven't done any changes to Crop() function. It might be that your x and y limits are outside the range of the tissue coordinates. Also note that, x and y axes labels are swapped in the ImageDimPlot():

Alternatively, plot your data first with ImageDimPlot(), and try to switch x and y args (ie, y=c(1250,3750),x=c(8000,9000)) in your Crop() run and see if it works.

Thank you for your kind suggestion. I've tried to swap the x and y coordinates but still got the same error. I'll open a new issue for my problem.

tangboyun avatar Aug 03 '23 01:08 tangboyun

Hi alikhuseynov, I find that without options(Seurat.object.assay.version = "v5"), I will fail at the Crop function. With options(Seurat.object.assay.version = "v5"), I will fail at GetAssay:

> library(Seurat)
> options(Seurat.object.assay.version = "v5")
# The following 3 packages are required to make 'LoadVizgen' happy
> library(dplyr)
> library(magrittr)
> library(spatstat)

# 'verbose' has no default value
> vizgen.obj <- LoadVizgen(data.dir='/path/to/vizgen',fov='fov',verbose=TRUE)
> vizgen.obj <- SCTransform(vizgen.obj, clip.range = c(-10, 10), assay='Vizgen')
Error in GetAssay.Seurat(object = object, assay = assay) : 
  Vizgen is not an assay present in the given object. Available assays are: 
> GetAssay(vizgen.obj,assay='Vizgen')
Error in GetAssay.Seurat(vizgen.obj, assay = "Vizgen") : 
  Vizgen is not an assay present in the given object. Available assays are:
> vizgen.obj@assays$Vizgen
Assay (v5) data with 300 features for 65319 cells
First 10 features:
 Ccnd2, Th, Lhx2, Cdh4, Pdgfra, Sox9, Rnd2, Ube2c, Col1a1, Npas1 
Layers:
 counts 
 

Maybe this pull request should be rebased on the 'seurat5' branch?

tangboyun avatar Aug 04 '23 07:08 tangboyun

Hi alikhuseynov, I find that without options(Seurat.object.assay.version = "v5"), I will fail at the Crop function. With options(Seurat.object.assay.version = "v5"), I will fail at GetAssay:

> library(Seurat)
> options(Seurat.object.assay.version = "v5")
# The following 3 packages are required to make 'LoadVizgen' happy
> library(dplyr)
> library(magrittr)
> library(spatstat)

# 'verbose' has no default value
> vizgen.obj <- LoadVizgen(data.dir='/path/to/vizgen',fov='fov',verbose=TRUE)
> vizgen.obj <- SCTransform(vizgen.obj, clip.range = c(-10, 10), assay='Vizgen')
Error in GetAssay.Seurat(object = object, assay = assay) : 
  Vizgen is not an assay present in the given object. Available assays are: 
> GetAssay(vizgen.obj,assay='Vizgen')
Error in GetAssay.Seurat(vizgen.obj, assay = "Vizgen") : 
  Vizgen is not an assay present in the given object. Available assays are:
> vizgen.obj@assays$Vizgen
Assay (v5) data with 300 features for 65319 cells
First 10 features:
 Ccnd2, Th, Lhx2, Cdh4, Pdgfra, Sox9, Rnd2, Ube2c, Col1a1, Npas1 
Layers:
 counts 
 

Maybe this pull request should be rebased on the 'seurat5' branch?

All PRs go to the "develop" branch of Seurat. Seurat "v5" is bit different with new/updated features & functions, I could add this support to my forked branch "seurat5" as well soon. If you are using this PR patch to load Vizgen data, make sure to use SeuratObject version 4.1.3 (can be installed form CRAN)

alikhuseynov avatar Aug 05 '23 13:08 alikhuseynov

Thanks @lambdamoses, I got inspired by SpatialFeatureExperiment where sf package is used to handle and filter segmentation geometries.

Updates for this PR:

  • adding support to efficiently filter segmentation polygons (removing small artifacts etc..)
    • added helper function .filter_polygons() (a bit modified version from SpatialFeatureExperiment:::.filter_polygons) for filtering given min.area - minimal polygon area as a threshold, if set to NULL the max. area (ie, the largest polygon) will be used.
  • allow for multiple segmentation polygons per cell (..still using single z-plane for now)

How to use example -> make Seurat objects (essentially the same with a small update)

@dfhannum if you would like re-run it on some of your Vizgen data as before, to ensure the robustness, would be awesome. Thanks!

alikhuseynov avatar Aug 28 '23 14:08 alikhuseynov

Hi Seurat Team,

I am with Vizgen. I just tested alikhuseynov:feat/vizgen with Vizgen data. This update works well. please see this file: https://github.com/swangvzg/seurat/blob/master/notebooks/Seurat_PR.html

@AustinHartman @dfhannum @alikhuseynov @tangboyun

Thanks, Shichen

swangvzg avatar Jan 05 '24 16:01 swangvzg

Hi Seurat Team,

I am with Vizgen. I just tested alikhuseynov:feat/vizgen with Vizgen data. This update works well. please see this file: https://github.com/swangvzg/seurat/blob/master/notebooks/Seurat_PR.html

@AustinHartman @dfhannum @alikhuseynov @tangboyun

Thanks, Shichen

Thanks @swangvzg! I'm glad it works.. It is still on Seurat "4.3.0.9002" and it works with SeuratObject "5.0.1" I would say it's up to develop team to integrate this PR into Seurat v5, I can make any additional changes that are needed/requested.

@swangvzg & @dfhannum, btw, we are developing Bioc. SpatialFeatureExperiment (SFE) and a converter between it and Seurat (under development). There is readVizgen in PR, if you and other Vizgen members are interested in testing it as well.

alikhuseynov avatar Jan 05 '24 17:01 alikhuseynov

I can confirm that it work with Seurat v5 as well (R version 4.3.1), tested on alikhuseynov:vizgen_seurat5 (which is essentially the most recent develop branch with PR #8298)

alikhuseynov avatar Jan 11 '24 11:01 alikhuseynov