velocyto.R
velocyto.R copied to clipboard
Running velocyto on multiple samples with 10X
Dear Velocyto team,
I have a question about running velocyto on eight 10x samples. Since 10X pipeline gives bam file for each sample, I can run velocyto on each bam file one by one and generate eight loom files. I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples? I would like to have some advise on how to run velocyto and achieve that map efficiently?
Yours, Ying DIng
I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?
Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.
Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:
merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
mat <- read.rds(rds_x)
mat_exon <- mat$exon
mat_intron <- mat$intron
mat_spanning <- mat$spanning
## avoid collision of same cell barcode
colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))
colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)
merged_mats <- list(
exon=merged_exon_mat,
intron=merged_intron_mat,
spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...
You could probably do the same operation on loom file, though I never give a try.
A final suggestion would be considering the batch effect. ~~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~~ (Update: the integrated values are no longer UMIs so don't do this.)
Hi, @Puriney , I would like to follow up this issue. Does Velocity.R take logCounts instead of counts matrix for down stream estimation? Will this alternative would affect the computing of Velocity then?
The reason is that after the batch effect correction has been applied, the resultant output came with logcounts only.
Thanks!
I don't think it is good to use logCounts for the input. Because velocity.R internally relies on raw counts. Here are some examples:
- cell size calculation https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L109-L113
- Use the log normalized count matrix to define the k nearest neighbors of cells for 'smoothing' https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L116-L125
- Use the log normalized count matrix for fitting. https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L303-L306
Maybe you can manually convert the logcounts back to raw counts. Read batchelor
code to find if they use 10, 2, e and also what is the pseudo-count.
@Puriney
Hi,
Regarding your suggestion of Seurat Integration. How do you bring seurat integration here, just by using the UMAP/tSNE from seurat integrated analysis to show velocity in velocity.R?
Kindly advice. It would be great for me and I suppose many to have clarity over this as Seurat Integration is also popular for batch correction.
Thanks in advance!
Just if I could chime in, I think https://github.com/satijalab/seurat-wrappers has what you need, and which I am using (I am not starting from .loom file, but essentially it requires only small modification).
The vignette is http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/velocity.html
@chlee-tabin @Puriney
This is what I did, It would be great if a contributor can comment to confirm.
ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)
bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)
spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))
#Import the original seurat object where I integrated 4 different 10x runs using seurat v3
SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous
SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)
It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.
I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?
@chlee-tabin @Puriney
This is what I did, It would be great if a contributor can comment to confirm.
ldat_obj.1 <- ReadVelocity(file = "obj.1.loom") obj.1 <- as.Seurat(ldat_obj.1) . . . ldat_obj.4 <- ReadVelocity(file = "obj.4.loom") obj.4 <- as.Seurat(ldat_obj.4) bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE) spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced")) unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced")) unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous")) #Import the original seurat object where I integrated 4 different 10x runs using seurat v3 SO <- readRDS("Integrated_seurat_object.rds") SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)
It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.
This is reasonable
SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?
I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?
@kaizen89 , As far as I understand, In the end at least in Seurat Integration, you are only using 'correction' to cluster and for dimensionality reduction. Your 'expression values' are not per se batch corrected.
Velocyto is using the original 10x data to make loom output file. You are only super imposing the velocyto analysis on the 'corrected' umap? Please correct me If I am thinking wrong.
SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?
Sorry, what error message do you get? What is the size of your data?
SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous Could you help me to deal with this?
Sorry, what error message do you get? What is the size of your data?
Hi, I tried but got "Error: Cannot add a different number of cells than already present" And It could run without reasonable results. Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells? Thanks a lot!
SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous
I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous Could you help me to deal with this?
Sorry, what error message do you get? What is the size of your data?
Hi, I tried but got "Error: Cannot add a different number of cells than already present" And It could run without reasonable results. Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells? Thanks a lot!
Hi @ZxyChopcat
Your SO should have the same number of cells. Before making the assay objects, filter your seura object to contain only cells that are also present int eh integrated object
bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)
#Import the original seurat object where I integrated 4 different 10x runs using seurat v3
SO <- readRDS("Integrated_seurat_object.rds")
spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))
It is also a good practice to check the number of cells frequently in you r seurat objects using dim(Seurat.Object)
Hope it works! Saeed
@chlee-tabin @Puriney
This is what I did, It would be great if a contributor can comment to confirm.
ldat_obj.1 <- ReadVelocity(file = "obj.1.loom") obj.1 <- as.Seurat(ldat_obj.1) . . . ldat_obj.4 <- ReadVelocity(file = "obj.4.loom") obj.4 <- as.Seurat(ldat_obj.4) bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE) spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced")) unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced")) unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous")) #Import the original seurat object where I integrated 4 different 10x runs using seurat v3 SO <- readRDS("Integrated_seurat_object.rds") SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)
It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.
Thank you for providing a very clear idea, but I think we should notice that cell ids from loom file are not exactily the same format as 10x cellranger output files under most conditions. Thus, before running GetAssayObject, we should run RenameCells() to make sure that cell names from VelocytoObj and 10xSeuratObj are exactly the same. Thanks again. Here are my codes:
load(file = "./tmp/Vlobj_orig.Rdata") #this is the SeuratObj from loom files post-merged #define a function to transfer loom id to a new id according to the original SeuratObj #which need to be modified according to your own cell ids TransferNames = function(loom_names){ loom_names.tmp <- gsub(":","_",loom_names) loom_names.new <- paste0(substring(loom_names.tmp,1,(str_length(loom_names.tmp)-1)),"-1") return(loom_names.new) } Vlobj <- RenameCells(Vlobj, new.names = TransferNames(colnames(Vlobj))) #create data objects spliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "spliced")) unspliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "unspliced")) ambiguous <- CreateAssayObject(GetAssayData(Vlobj, assay = "ambiguous")) #add into original SeuratObj snRNA_har[["spliced"]] <- spliced snRNA_har[["unspliced"]] <- unspliced snRNA_har[["ambiguous"]] <- ambiguous
I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?
Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.
Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:
merged_exon_mat <- matrix() merged_intron_mat <- matrix() merged_spanning_mat <- matrix() for rds_x in c(rds1, rds2, ..., rds8): mat <- read.rds(rds_x) mat_exon <- mat$exon mat_intron <- mat$intron mat_spanning <- mat$spanning ## avoid collision of same cell barcode colnames(mat_exon) <- paste(rds_x, colnames(mat_exon)) colnames(mat_intron) <- paste(rds_x, colnames(mat_intron)) colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning)) merged_exon_mat <- cbind(merged_exon_mat, mat_exon) merged_intron_mat <- cbind(merged_intron_mat, mat_intron) merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning) merged_mats <- list( exon=merged_exon_mat, intron=merged_intron_mat, spanning=merged_spanning_mat) ## use merged_mats to run velocyto.R ## ...
You could probably do the same operation on loom file, though I never give a try.
A final suggestion would be considering the batch effect. ~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~ (Update: the integrated values are no longer UMIs so don't do this.)
Thanks for the code!But I found another problem while running------ Matrices must have same number of rows in cbind2(.Call(dense_to_Csparse, x), y).
SO <- readRDS("7.16second.fix/merge_only3/merge.3.seurat.7.16.rds") SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced Error: Cannot add a different number of cells than already present
Using this method, if I have filtered the cells in advance, how can I solve it?