velocyto.R icon indicating copy to clipboard operation
velocyto.R copied to clipboard

Running velocyto on multiple samples with 10X

Open dingying85 opened this issue 6 years ago • 16 comments

Dear Velocyto team,

I have a question about running velocyto on eight 10x samples. Since 10X pipeline gives bam file for each sample, I can run velocyto on each bam file one by one and generate eight loom files. I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples? I would like to have some advise on how to run velocyto and achieve that map efficiently?

Yours, Ying DIng

dingying85 avatar Nov 15 '18 19:11 dingying85

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:

merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...

You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. ~~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~~ (Update: the integrated values are no longer UMIs so don't do this.)

Puriney avatar Nov 24 '18 05:11 Puriney

Hi, @Puriney , I would like to follow up this issue. Does Velocity.R take logCounts instead of counts matrix for down stream estimation? Will this alternative would affect the computing of Velocity then?

The reason is that after the batch effect correction has been applied, the resultant output came with logcounts only.

Thanks!

zqzneptune avatar May 09 '19 18:05 zqzneptune

I don't think it is good to use logCounts for the input. Because velocity.R internally relies on raw counts. Here are some examples:

  • cell size calculation https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L109-L113
  • Use the log normalized count matrix to define the k nearest neighbors of cells for 'smoothing' https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L116-L125
  • Use the log normalized count matrix for fitting. https://github.com/velocyto-team/velocyto.R/blob/d7790346cb99f49ab9c2b23ba70dcf9d2c9fc350/R/momentum_routines.R#L303-L306

Maybe you can manually convert the logcounts back to raw counts. Read batchelor code to find if they use 10, 2, e and also what is the pseudo-count.

Puriney avatar May 09 '19 18:05 Puriney

@Puriney

Hi,

Regarding your suggestion of Seurat Integration. How do you bring seurat integration here, just by using the UMAP/tSNE from seurat integrated analysis to show velocity in velocity.R?

Kindly advice. It would be great for me and I suppose many to have clarity over this as Seurat Integration is also popular for batch correction.

Thanks in advance!

saeedfc avatar Oct 22 '19 09:10 saeedfc

Just if I could chime in, I think https://github.com/satijalab/seurat-wrappers has what you need, and which I am using (I am not starting from .loom file, but essentially it requires only small modification).

The vignette is http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/velocity.html

chlee-tabin avatar Oct 22 '19 14:10 chlee-tabin

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

saeedfc avatar Oct 29 '19 17:10 saeedfc

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

kaizen89 avatar Feb 07 '20 17:02 kaizen89

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

This is reasonable

yaolutian avatar Feb 19 '20 05:02 yaolutian

SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

millersan avatar Jun 28 '20 06:06 millersan

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

@kaizen89 , As far as I understand, In the end at least in Seurat Integration, you are only using 'correction' to cluster and for dimensionality reduction. Your 'expression values' are not per se batch corrected.

Velocyto is using the original 10x data to make loom output file. You are only super imposing the velocyto analysis on the 'corrected' umap? Please correct me If I am thinking wrong.

saeedfc avatar Jun 28 '20 23:06 saeedfc

SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

Sorry, what error message do you get? What is the size of your data?

saeedfc avatar Jun 28 '20 23:06 saeedfc

SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous Could you help me to deal with this?

Sorry, what error message do you get? What is the size of your data?

Hi, I tried but got "Error: Cannot add a different number of cells than already present" And It could run without reasonable results. Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells? Thanks a lot!

XYZuo avatar Jul 19 '20 12:07 XYZuo

SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result. SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced SO[["ambiguous"]] <- ambiguous Could you help me to deal with this?

Sorry, what error message do you get? What is the size of your data?

Hi, I tried but got "Error: Cannot add a different number of cells than already present" And It could run without reasonable results. Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells? Thanks a lot!

Hi @ZxyChopcat

Your SO should have the same number of cells. Before making the assay objects, filter your seura object to contain only cells that are also present int eh integrated object


bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)
#Import the original seurat object where I integrated 4 different 10x runs using seurat v3
SO <- readRDS("Integrated_seurat_object.rds")
spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

It is also a good practice to check the number of cells frequently in you r seurat objects using dim(Seurat.Object)

Hope it works! Saeed

saeedfc avatar Jul 19 '20 17:07 saeedfc

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

Thank you for providing a very clear idea, but I think we should notice that cell ids from loom file are not exactily the same format as 10x cellranger output files under most conditions. Thus, before running GetAssayObject, we should run RenameCells() to make sure that cell names from VelocytoObj and 10xSeuratObj are exactly the same. Thanks again. Here are my codes:

load(file = "./tmp/Vlobj_orig.Rdata") #this is the SeuratObj from loom files post-merged

#define a function to transfer loom id to a new id according to the original SeuratObj
#which need to be modified according to your own cell ids
TransferNames = function(loom_names){
  loom_names.tmp <- gsub(":","_",loom_names)
  loom_names.new <- paste0(substring(loom_names.tmp,1,(str_length(loom_names.tmp)-1)),"-1")
  return(loom_names.new)
}
Vlobj <- RenameCells(Vlobj,
                     new.names = TransferNames(colnames(Vlobj)))

#create data objects
spliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "unspliced"))
ambiguous <- CreateAssayObject(GetAssayData(Vlobj, assay = "ambiguous"))

#add into original SeuratObj
snRNA_har[["spliced"]] <- spliced
snRNA_har[["unspliced"]] <- unspliced
snRNA_har[["ambiguous"]] <- ambiguous

JerryZhang-1222 avatar Mar 24 '22 05:03 JerryZhang-1222

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:

merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...

You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. ~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~ (Update: the integrated values are no longer UMIs so don't do this.)

Thanks for the code!But I found another problem while running------ Matrices must have same number of rows in cbind2(.Call(dense_to_Csparse, x), y).

curryfly5 avatar May 12 '22 16:05 curryfly5

SO <- readRDS("7.16second.fix/merge_only3/merge.3.seurat.7.16.rds") SO[["spliced"]] <- spliced SO[["unspliced"]] <- unspliced Error: Cannot add a different number of cells than already present

Using this method, if I have filtered the cells in advance, how can I solve it?

yanpinlu avatar Jul 19 '22 01:07 yanpinlu