velocyto.R icon indicating copy to clipboard operation
velocyto.R copied to clipboard

Seurat2 with Veloctyo.R

Open Jemkon opened this issue 7 years ago • 31 comments

Hello,

I have tested the velocyto.R package with Pagoda2 as described in tutorials. But I would really like to test it with Seurat2. How do I create cell embedding, cell clustering, and accurate cell-cell distance matrix using Seurat2? Many thanks.

Jemkon avatar Feb 13 '18 14:02 Jemkon

This is more of a question for the Seurat list, but we'll try to add an example when we update tutorials.

pkharchenko avatar Mar 01 '18 03:03 pkharchenko

I happened to be a fan of both packages. You would find the answers at the Seurat2's pbmc3K tutorial.

For cell clustering, see the section "Cluster the cells". The function is FindClusters.

For cell embedding, always run RunPCA first and then RunTSNE as the RunTSNE internally relies on the existence of PCA slot. The actual coordinates of the embedding are fetched by seurat_obj@[email protected] and seurat_obj@[email protected] for PCA and tSNE, respectively. With the embeddings, you can easily follow velocyto's tutorial to mask the velocity.

For the better cell distance, in principle, you could adapt the below code from velocyto tutorial to Seurat2 object by replacing the r$reductions$PCA with the Seurat2's PCA coordinates. (I did not actually test this suggestion)

cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))

Puriney avatar Mar 01 '18 05:03 Puriney

It worked very well for me using Seurat object in velocyto functions, I just needed to rename the emat and nmat colnames in order to fit with the ones in Seurat Object:

emat <- emat[,rownames([email protected])]; nmat <- nmat[,rownames([email protected])]; 
cluster.label <- pbmc@ident
cell.colors <- pagoda2:::fac2col(cluster.label)
emb <- pbmc@[email protected]
cell.dist <- as.dist(1-armaCor(t(pbmc@[email protected])))

ramonvidal avatar Mar 16 '18 08:03 ramonvidal

Hi @jemkon,

For Seurat you need to use a single matrix, as far as I understand the dat object contains 3 different matrices (spliced, ambiguous, unspliced). The function to create a seurat object is very simple:

CreateSeuratObject(raw.data, project = "SeuratProject", min.cells = 0, min.genes = 0, normalization.method = NULL, scale.factor = 10000, do.scale = FALSE, do.center = FALSE, names.field = 1, names.delim = "_", meta.data = NULL, display.progress = TRUE, ...)

But you need a raw.data in a single matrix format with cells in columns and gene in rows.

I dont know what you are trying to do, but I was not happy with the combination of my seurat results with velocyto. I ended up using pagoda + velocyto pipeline and matching the t-SNE clusters later by looking at the gene markers.

Best, Ramon

On Wed, 28 Mar 2018 at 11:06 Jemkon [email protected] wrote:

Hi @ramonvidal https://github.com/ramonvidal,

Thanks for sharing your code for seurat. I was wondering how do I use the "cell.counts.matrices.rds" file (created by dropEST) to create seurat object?

I tried following code but failed....

  dat <- readRDS("/home/10X_CellRnager/velocyto/cell.counts.matrices.rds")
  pbmc <- NormalizeData(object = dat, normalization.method = "LogNormalize",
                  scale.factor = 10000)

Could you please share more info on creating seurat object? Many thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-376815655, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG_gt5KGaD2OtnGbkdpjRHSrc1zWMllks5ti1KvgaJpZM4SD2VK .

-- Ramon Vidal +4917621123210 <+49%20176%2021123210>

ramonvidal avatar Mar 28 '18 11:03 ramonvidal

Thanks @ramonvidal.

I was trying to compare and see how pagoda and seurat performs with velocyto. I found the same. The Pagoda2 + velocyto works best compared to Seurat2 + velocyto.

Many thanks for your suggestions.

Jemkon avatar Mar 28 '18 16:03 Jemkon

I would rahter vote for Seurat2 + velocyto simply because Seurat enables "Diffusion Map" ( See RunDiffusion function ) as a method of dimensionality reduction while Pagoda2 doesn't . I find diffusion maps better at capturing the dynamics of cell trajectories than t-SNE. Therefore, I would most of the time prefer to use a diffusion map embedding ( GetCellEmbeddings(object, reduction.type = "dm") ).

charles-bernard avatar Apr 06 '18 14:04 charles-bernard

Could someone please explain to me how/where I get the emat (spliced count matrix) and nmat (unspliced count matrix) from if I am using a seurat object? I am trying to use velocyto.R to confirm some of my results.

LineWulff avatar Oct 10 '18 09:10 LineWulff

@LineWulff - You will need to run the Velocyto command line tools on the dataset first to get the splicing information. That will create the loom file which you can get the emat and nmat. Then you can use the tSNE embeddings from Seurat, with the emat and nmat from the loom file to make your plot... it's a bit tricky.

jebard avatar Oct 12 '18 16:10 jebard

@ramonvidal How did you make your pbmcY1.rds?

x811zou avatar Jan 27 '19 15:01 x811zou

@ramonvidal How did you make your pbmcY1.rds?

Hi @x811zou , that is a typo, pbmcY1 == pbmc

ramonvidal avatar Jan 27 '19 17:01 ramonvidal

@ramonvidal , I was able to create cell embedding and clustering using seurat, but when I rename the emat and nmat colnames by running emat <- emat[,rownames([email protected])]; nmat <- nmat[,rownames([email protected])]; It gave me this :
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : invalid character indexing Do you have any suggestions? Thank you in advance!

chinalex9527 avatar Feb 04 '19 23:02 chinalex9527

I may not be following, but "renaming the emat and nmat colnames" should be

colnames(emat) <- rownames([email protected])
colnames(nmat) <- rownames([email protected])

chlee-tabin avatar Feb 05 '19 14:02 chlee-tabin

@chlee-tabin , sorry I'm completely new to R scripts, I guess my question is in velocity estimation, they used Pagoda2 object for the following command: emat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)] How can I modify it with the Seurat object I created? Thank you!

chinalex9527 avatar Feb 06 '19 02:02 chinalex9527

@chinalex9527 , with a Seurat object I did: emat <- emat[,rownames([email protected])]; nmat <- nmat[,rownames([email protected])] This step will filter out cells from emat and nmat, which are not in the seurat/pagoda object. I had to rename my emat and nmat columns first though. You have to inspect how the cells are named in both your seurat object (e.g. rownames([email protected])) and the nmat/emat format.

LineWulff avatar Feb 06 '19 08:02 LineWulff

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)


# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced


# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@[email protected]

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@[email protected])))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object


# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)


 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)
 

jebard avatar Feb 06 '19 15:02 jebard

@jebard , thank you so much! The code worked beautifully!

chinalex9527 avatar Feb 08 '19 00:02 chinalex9527

@jebard , Many thanks for your full code!! Very helpful, since applying RNA velocity does not seem to be straightforward, and a clear pipeline to use it in conjunction with Seurat is missing from the creators, as far as I know.

It will be great if you could please clarify two things:

  1. Towards the end of your code when you try to get the color out of the Seurat TSNE object, the code you gave above doesn't seem to work anymore. I'm using Seurat v2.3.4, R v3.5.0 and ggplot v2_3.1.0. Maybe the slots have changed? I spent some time to explore the slots to find out where the color info is stored, but I haven't had luck so far. Can you please help if you know?

  2. Once we successfully run show.velocity.on.embedding.cor, how exactly should we make the plot? Is the output p1 a ggplot object? Can we use print() pdf() or cowplot::save_plot with it?

Many thanks again, in advance!

CodeInTheSkies avatar Mar 26 '19 17:03 CodeInTheSkies

Further, the output of show.velocity.on.embedding.cor is a list. So, basically, I would like to know how best to convert this output to a ggplot, gtable, or a grob.

Thank you!

CodeInTheSkies avatar Mar 26 '19 17:03 CodeInTheSkies

@CodeInTheSkies

Went back to the code and was able to get the colors out using this:

gg <- TSNEPlot(myData) colors <- as.list(ggplot_build(gg)$data[[1]]$colour) names(colors) <- rownames(emb)

I'm not sure why originally i had it as ggplot$Lpo -- i'll update my code above accordingly.

The output of the command show.velocity.on.embedding.cor should result in a plot being generated. I use Rstudio so it automatically pops up a plot. How I typically export to a file is by:

png("~/my-exported-graph.png") show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt', cell.colors=ac(colors,alpha=0.5), cex=0.8,arrow.scale=2,show.grid.flow=T, min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1, do.par=F,cell.border.alpha = 0.1, n.cores=24,main="Cell Velocity") dev.off()

The reason I have been saving it into the p1 object, is that it can help speed up replotting the figure should you need to make any graphical changes (like arrow size / density) -- to do this, you pass the parameter cc=p1$cc

R version 3.4.2 Seurat_2.3.4 cowplot_0.9.2 ggplot2_3.0.0 velocyto.R_0.6

jebard avatar Mar 27 '19 16:03 jebard

Hi @jebard

Thanks for very detailed demonstration! Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates? I saw this step in this tutorial

MichaelPeibo avatar Apr 23 '19 07:04 MichaelPeibo

It’s not strictly necessary to select “cluster-distinguishing genes - it’s a recommended step. If different clusters in your dataset represent different points in the dynamic process, this will likely result in an informative gene set.

On Apr 23, 2019, at 3:36 AM, MichaelPeibo [email protected] wrote:

Hi @jebard https://github.com/jebard Thanks for very detailed demonstration! Just one question related to this. Do I need filter.genes.by.cluster.expression before gene.relative.velocity.estimates? I saw this step in this tutorial http://pklab.med.harvard.edu/velocyto/notebooks/R/DG1.nb.html — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/velocyto-team/velocyto.R/issues/16#issuecomment-485679760, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2PX4XJZ5W6FDKBVR7ISGTPR237VANCNFSM4EQPMVFA.

pkharchenko avatar Apr 26 '19 17:04 pkharchenko

Hi @pkharchenko Thanks for response! I will try to do filter and without to to if there is a difference.

MichaelPeibo avatar Apr 27 '19 01:04 MichaelPeibo

Hi @jebard

Thanks for your code. I was not sure about this command: emb <- myData@[email protected]

What exactly is dr?

When I was trying to run this command:

emb <- lrc8@[email protected] Error: no slot of name "dr" for this object of class "Seurat"

Fu-Luo avatar Jun 29 '19 05:06 Fu-Luo

Hi @MazzzzzeLuo This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

MichaelPeibo avatar Jun 29 '19 07:06 MichaelPeibo

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot.

Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

GBeattie avatar Oct 16 '19 12:10 GBeattie

I'm having some trouble with transferring some of the commands over.
When I try to change the cluster label from the pagoda2 imput to a seurat one I keep throwing an error.

cluster.label <- object@ident
    Error: no slot of name "ident" for this object of class "Seurat"

Shouldn't ident be automatically created when the cluster neighbors are generated? When I ran a DimPlot for UMAP I was able to group it by "ident" as well, so I'm not sure what the problem is.

Also like @GBeattie I am trying to use Seurat v3 rather than v2

ArcusGears avatar Oct 23 '19 17:10 ArcusGears

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot. Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

hI, Have you solven this problem? I met the same error using Seurat 3. Thanks!

XYZuo avatar Jul 19 '20 13:07 XYZuo

Here is my full code for making this work -- hope this helps you all

#library(devtools)
#install_github("velocyto-team/velocyto.R")
library(velocyto.R)
library(pagoda2)
library(Seurat)


# This is generated from the Velocyto python command line tool.
# You need a loom file before you can proceed
ldat <- read.loom.matrices("possorted_genome_bam_XL2S3.loom")

# Gather the spliced and unspliced estimates
emat <- ldat$spliced
nmat <- ldat$unspliced


# take embedding from the Seurat data object
# NOTE: This assumes you have a seurat data object loaded
# into R memory prior to using this script. STOP and rerun seurat
# pipeline if you do not have this loaded. In my case, my seurat object is simply myData
emb <- myData@[email protected]

# Estimate the cell-cell distances 
cell.dist <- as.dist(1-armaCor(t(myData@[email protected])))

# This is a little bit of foo-magic that needs to be adjusted on a per-sample
# basis depending on the cell names and how you ran the pipeline. Each cell
# stored in the loom object and seurat have an ID, make sure these are the same.
# in this case -- i need to trim 28 characters off the front to match seurat object.
colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
colnames(nmat) <- paste(substring(colnames(nmat),28,43),sep="")

# What this step does is essentially this:
# > head(colnames(emat))
# [1] "possorted_genome_bam_XL2S3:AAAGATGCATACTACGx"
# [2] "possorted_genome_bam_XL2S3:ACCTTTATCTTTAGTCx"
# [3] "possorted_genome_bam_XL2S3:AAGGAGCCACGCATCGx"

# > colnames(emat) <- paste(substring(colnames(emat),28,43),sep="")
# > head(colnames(emat))
# [1] "AAAGATGCATACTACG" "ACCTTTATCTTTAGTC" "AAGGAGCCACGCATCG" 

# Now the names in emat and nmat will match up to the cell names used in my seurat object


# I'm not sure what this parameter does to be honest. 0.02 default
# perform gamma fit on a top/bottom quantiles of expression magnitudes
fit.quantile <- 0.02

# Main velocity estimation
rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
                                            kCells=10,
                                            cell.dist=cell.dist,
                                            fit.quantile=fit.quantile,
                                            n.cores=24)

# This section gets the colors out of the seurat tSNE object so that my seurat and velocyto plots use the same color scheme.
gg <- TSNEPlot(myData)
ggplot_build(gg)$data
colors <- as.list(ggplot_build(gg)$data[[1]]$colour)
names(colors) <- rownames(emb)


 p1 <- show.velocity.on.embedding.cor(emb,rvel.cd,n=30,scale='sqrt',
                                      cell.colors=ac(colors,alpha=0.5),
                                      cex=0.8,arrow.scale=2,show.grid.flow=T,
                                      min.grid.cell.mass=1.0,grid.n=50,arrow.lwd=1,
                                      do.par=F,cell.border.alpha = 0.1,
                                      n.cores=24,main="Cell Velocity")#,cc=p1$cc)
 

I ran follow your code, and the last step went wrong, Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option how can I slove it?

Bioinformatics-rookie avatar Sep 16 '20 15:09 Bioinformatics-rookie

Hi @Bioinformatics-rookie , I met the same problem before and solved it by installing libopenblas. I use anaconda so I ran conda install -c conda-forge libopenblas Hope it can help you.

XYZuo avatar Sep 18 '20 02:09 XYZuo

This code is for exacting cell embeddings such as tsne or umap, you have to RunTSNE to get this slot. Also make sure you are using Seuratv2 rather than v3.

@MichaelPeibo Would it be possible to elaborate the need for v2 over v3? I'm using v3 and getting an error (below) that I'm trying to trace, may be due to the inputs which may be version related.

> rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=2,
+                                             kCells=10,
+                                             cell.dist=cell.dist,
+                                             fit.quantile=fit.quantile,
+                                             n.cores=2)
matching cells between cell.dist and emat/nmat ... done
calculating cell knn ... done
calculating convolved matrices ... Error in intI(i, n = d[1], dn[[1]], give.dn = FALSE) : 
  no 'dimnames[[.]]': cannot use character indexing
In addition: Warning message:
In labels(cell.dist) == colnames(emat) :
  longer object length is not a multiple of shorter object length

I have the same problem too. Have you guys solved it?

Battamama avatar Aug 23 '21 01:08 Battamama